I wrote about using analytics to get more value out of communities and how to use analytics to get to know your customers better. However, the more I think about it, the more that I see the accuracy of the analysis as one of the most relevant issues for analytics. Continuing the series on the topics that matter for analytics, I want to look at the issue of accuracy and its effect in business.
Accuracy is the rating that defines how true is the analysis performed by the computer without human intervention. In other words, and talking about social analytics, how real is the computer’s perception that a tweet or blog post has positive or negative inclination. The only way to measure accuracy is by comparing the results of the computer analysis to similar analysis done by humans. In other words, did the computer pick the same a human would’ve picked?
We only rate accuracy as the capacity of a computer to mimic a human brain and we can only conceive that a computer can be accurate when copying how a person thinks. Alas, we forget to consider the inherent bias built into computer calculations: the one provided by humans programming such systems.
Computers only do what we tell them to do. They have (almost) infinite computational power, and can apply any set of rules to any computational variables. This means that if we tell computers that a specific word or combination of words means something positive, then the computer cannot make it mean something negative. In other words, we are not really rating the computer’s ability to determine a sentiment we are rating whether humans did a good job, or not, in biasing the computer to pick that sentiment. This means we can accurately predict an outcome selected by the computer before the first variable is computed against the first rule.
Accuracy is not a computer function, rather a human bias – and its “manipulation” can have great effects in the destiny of the business. Learning to reduce the bias is going to yield better results for analytics, and where the efforts should be spent – so that you can get real value out of the analysis and so you can know where to take action, because taking action on inaccurate data can be a disaster.
Reducing the bias comes down to the two core elements of analytics: what we analyze, and what we compare it to. What we analyze has two variables: data and rules. What we compare it against has two variables as well: taxonomy (categories) and ontology (definitions).
Let’s examine how we add bias to these four variables:
- The data we chose to examine will determine the outcome of the analysis very quickly. Whether we chose to analyze results from a survey, blog posts we picked up, tweets, or other expressions of human opinion, finding where the data to be analyzed resides is the most critical part of the analysis. If we know that our customers frequent a specific community to discuss our products and services, analyzing twitter will not yield good results. We tend to focus our listening to a myriad of communities, a very large number of them that don’t really matter. And listening to the wrong place will bias the results by watering down the value of the positive and negative overall thinking.
- The rules are the place where the majority of the organizations introduce their bias. This is where you will determine whether a specific input matters or not, and how it should be counted. Domain expertise and knowledge of the business removes the bias in this stage. There is nothing easier, for someone well versed in the event being analyzed, than to determine the rules that matter or not. Some of the analytical engines have automated technology for creating these rules.
- The taxonomy is typically used to organize the insights found (sentiment, issues, etc.) into categories. A few of the available products offer pre-defined taxonomies to which organizations add categories based on their product lines (e.g. cars, motorcycles, trucks) or business functions (e.g. sales, accounting, shipping). Different customers have different categories or different ways to identify products and services
- The ontology (sometimes also called a topic domain) is simply word definitions. These overgrown dictionaries, which also include thesaurus and contextual word functions, determine whether the sentiment and analysis was performed between the right term and concepts. For example, defining the word “tire” for one industry might be different from another. One of the things that improve the ontology is having a capability around disambiguation, a computer algorithm used to better interpret the content and identify the sentiment and the issues it refers to.
Now let’s look at the implications of accuracy for the business. Let’s look at one specific example to show “accurately” (pun badly intended) how the correct set of variables can lead to the right (or wrong) conclusions.
A car manufacturer had a problem: airbags in their cars were deploying under unusual conditions. Even when the cars were not involved in accidents, not even a small hit to the sensors, the airbags were deploying. Upon initial analysis of the data the conclusion was that the mechanism that deployed the airbag (which costs around $30.00 to replace) may have been at fault. A recall was prepared to exchange it for all affected customers. Right before the recall was issued, the company decided to analyze in more detail the complaints they received from their customers (which explained in good detail how the incidents occurred), and cross-reference them to the detailed repair notes provided by the technicians fixing the airbags after deployment.
A higher degree of accuracy in the analysis was obtained by removing the bias that was introduced when translating the original complaints from the customers to mean that the entire mechanisms was at fault, and deconstructing the problem further to focus on each individual component of said mechanism which was detailed in the verbatim repair notes. This allowed the car maker to determine that the problem was not the entire discharge mechanism for the airbag, rather a small spring that overheated and relaxed under specific circumstances – a spring that cost just $0.25 to replace. The potential impact to the car company if they had to issue a recall for the launching mechanism for all affected vehicles versus the tiny spring, a savings total of $29.75 per car, quickly shows how a more accurate analysis and cross-referencing of the data helps the organization.
Are you removing biases from your analytics? Improving accuracy? Are you looking at multiple customer data sources and cross referencing them? How are you doing that? Let me know in the comments below, would love to see more examples.
This is the third part of a series of six sponsored research reports I am writing for Attensity on how to better leverage analytics in a social business.