How To Lose Credibility When Using Data

I have been saying for a while that data is the next big thing.

No, not data by itself (which we have been talking about forever) or even “Big Data” (which is thankfully past the prime of hype and beginning to decline and going back to its maiden name: Data).  We are talking about how we use data, what we do with it, and how it leads us to better decisions.

Almost two years ago I wrote a great piece (if I may say so) when I was working with Attensity researching the issues that plagued analytics.  This particular post was titled “How Accuracy Matters in Analytics“.  I looked at the bias humans introduce into data and analytics and how we are the cause of poor results.  If you have time to read it you’ll be a better person (OK, not that much – but it will give you something to think about – post is here) and there are some concrete steps you can take to remove bias from your data projects.

Data is wonderful.  Data gives you credibility.  Data makes you smart.

Data, unfortunately, is easy to twist around.

If you want to find a data point to support your theory – whatever it may be – I promise you you can find data to support it.  You can get a 90% customer satisfaction very easily (formula in a post I wrote long ago); more and more of  my clients, including vendors and non-vendors, now want specific data to support their theories.  I can create a survey or study or research project to support basically any statement you want to make — as long as we can agree on certain specifications.

Data can be found to support what you are saying; always has, always will.  The problem is not finding the data, it is using it. I am seeing more and more people use data poorly.  Instead of making contextual statements, people are using absolute statements about the data they have.

Making data absolute is so incredibly wrong.

If nothing else you remember about data remember this point: data is contextual, not absolute.

Anything you measure has to be done in context to be valid.  I am not just talking demographics (that is so old school – still works in some cases, but in today’s world demographics have become too fragmented to be reliable – long-tail analytics has drastically changed that), but differentiated segments.

When you introduce or use data you must provide context.

Context is what makes you smart, not just the data.  If you can properly use data in context, you will be far smarter than just using data.

Here are a few examples.  A few days or weeks ago (I don’t keep track of time anymore, reduces the stress) good friend Emanuele Quintarelli (@absolutesubzero on Twitter) said in the aforementioned social network:

That sparked a conversation between us where I said that the data was not wrong, just biased.  I had not yet read the report, to be honest, but have since then and my opinion has not changed.  The data is biased by the theories being proven (people complain in private channels more than public channels) and the bias on how the questions were being asked: people using private channels were asked to reply. The report is behind Forrester’s paywall (it is worth reading if you can get it, in spite of its bias it is an interesting report) so you may not be able to get it – but the way the questions are structured are set so the results come out exactly the way you see them above.  The people selected to reply to this survey, and the manner the survey was conducted, biased the answers towards private networks.  If the exact same survey would’ve been conducted in public social networks, the results would’ve been different – as they are in other studies asking similar questions from different respondents.

Another example, the Harris Interactive Customer Experience Impact Report from 2009  that is widely quoted (more specifically  the data point that says that 86% of consumers would change a service provider after just one bad experience).  Again, there is significant bias on this survey – but not on how the data was collected (it was biased towards generating a large number of positive responses by removing follow-up questions and context – e.g. nothing was ever said about whether they DID CHANGE the service provider), but how it was used.  The argument goes that since people would change providers after one bad experience (which was not defined either), then customers should focus all their efforts in providing better experiences.  Beyond the point whether this is true or not, the data was used to showcase a doomsday scenario to propel people to act on something that may or may not be problem.

Same argument goes against my favorite evil-fuzzy metric: NPS scores.

I could continue giving you examples for a long time, but you get the idea.

Data builds credibility.  It also takes it away if not used wisely.  Go forth and use data – just be careful on how you use it.


10 Replies to “How To Lose Credibility When Using Data”

  1. Hi Esteban, I agree with you but the fact that data is not absolute. I really think that data is “absolutely” absolute cause it’s only the way we want to gather them or present them that is contextual. It’s our interaction with the measurement methodology that cause subjectiveness and data are not guilty for that. Don’t you agree?


    1. Agree with most of what you saying – but making the statement that data is absolute does not fly with me — maybe we are meant to disagree there.

      there are tons of examples of bad use of data for lacking context (for example, correlating two disparate data items like amount of garbage produced and how late people go to bed – then making a correlation between them simply because they happen to trend equally when there is no causality or correlation).

      Data must be used in context to be valid, I cannot get away from that.

      But as I wrote in the original post – analytics are biased due to our methodologies, and lack of contextuality in using the data always stems from the collection through the reporting – so there may be something in what you say.

      To say data is absolute is to say that 20% of people in the world are pirates. Without qualification and context, there is no way that is true. With caveats and context (measured among the young men between 16 and 25 in Ethiopia) makes more sense.



  2. Hi Esteban,
    while I tend to agree on the overall rant about the distorted usage of data, I’m not sure to get your comment above about private vs public channels.

    The question asked was “In which of the following ways have you provided feedback or complained about unsatisfactory customer service interactions in the past 12 months?”. People using both private and public channels were asked to reply. Questions have been asked through an online survey and the data has been weighted to be representative of the population.

    Just to clarify: Forrester considers traditional channels (survey, phone call, email, online chat, or letter) as private and compares them to social channels (reviews, status updates, post in forums, comments on FB pages, blog posts, public tweets, private messages on Twitter).

    The outcome is pretty well aligned with any other surveys I’ve seen recently: traditional channels are way more used than social channels. While social channels collectively reach 26%, each single channel (i.e tweets) stops at less than 8%.

    Looking forward to your clarification.



    1. Emanuele,

      without talking to the author of the original study (as far as i can tell it goes back many years and keeps getting repeated) and evaluating the methodology in detail we are just having a theoretical discussion – which neither one of us can influence on the other.

      another example, btw, of data being contextual — your argument above makes sense in the context you present — but without knowing whether that is the correct context (knowing what went into it and how the study was done), it is hard to gather the value of the report.

      As for the statistics, I continue to have reports that point to different statitics – more time spent complaining online. I don’t have any handy (but be glad to find one if you want), but latest i remember is near 70% of people use public channels as defined by them above to complain, and the rest private channels to complain. I am more inclined to believe those than 26% for public channels…

      Alas, contextually speaking we are more or less just talking at this point…

      you do make great points, but my post was also not meant to take a stab at Forrester’s report, more or less using them as one of several examples.

      thanks for the reply.


      1. just found during a quick search one such study

        speaks to some 45% or more of people taking to the social channels to complain and get service – i am certain there are more like this. my own research points to the number somewhere in the 60% range…

        that was what started my rant, the data used was improperly used and the lack of context made it sound absolute when in reality it was just another opinion.


  3. I love this post. So many opportunities for comment. Forgive me for being disparate here.

    Regarding your example of people changing service providers: Is that the right metric to use or should we measure the behavior, expressed as those who actually did change providers?

    In the nonprofit sector we frequently hear, “X% of people surveyed said they’d be “interested” in volunteering.” But when we ask them specifically TO volunteer, the number who actually follow through is miniscule. (It’s an emotional thing, as is switching service providers.)

    Now, to context. How about this data point? We have 1 bazillion customers in our db.
    Now that’s something to be proud of, right?
    Actually, only 50,000 of them have purchased from us in the past 5 years.
    Only 5,0000 of them have purchased from us twice or more in that time period.

    If you devise a business strategy based upon the bazillion customer point, it will differ markedly from one based upon the second or the second and third points.

    I’m guessing that there are just as many strategies created based upon the “bazillion” data, as there are on the other two data points.

    One last comment. Regardless of whom you favored in the US presidential election, isn’t the Obama team’s use of data in the battleground states THE best example of how to use data to accomplish your business goals?



    1. excellent comment, and great insights as always — i know you get data, and you gave great example.

      the difference between the republican and democrat use of data in this election was made public in different places, and i believe you are correct — the democrats used data far more wisely and rode that to success. it was not about politics, it was about understanding the value of data in context (rallying people to go vote in a state they already won, for example, would’ve not have led to success equally well – context of what they know about what and where, and how to counter by using appropriate data).

      thanks for reading,


  4. I think in addition to the context argument (to which I concur) data also needs to be shown to be valid – or legitimate. I mean the culture of the web i.e. highly paced, often anonymous and sometimes lacking in responsibilty and thus morality means we get bombarded with ‘snippets’ if you will of data. As a result great inference is made on a tiny snippet of data selected from a wider body of data which may point in completely the other direction. So I’m keen for social researchers to do two things yes give context but also be consistent in their validation of data – and be sure to test that ‘snippets’ legitimately reflect themes. Great post. Thanks.


  5. “Context is what makes you smart, not just the data. If you can properly use data in context, you will be far smarter than just using data.”

    Excellent point. You can make raw data say just about anything you want when you apply the right filters. But does having a conclusion and finding data to fit it actually help your business? It’s hard to not have a little bias when analyzing data (we are human after all) but it needs to be a straight forward as possible to provide real insights.


    1. i am having a problem with one part of your comment:

      “You can make raw data say just about anything you want when you apply the right filters”

      That is not necessarily true, you cannot get data to say what it is not. If you have one answer of 1,000 saying one thing – you cannot make that single piece of data stand out as the answer. This is not a question of filters, it is a question of numbers.

      As for question, the proper use of data is to prove (or disprove, as I mentioned above) a theory. You have to have a question that you need answered, so collecting the data for the purpose of answering questions is the way to go. The concept that you can collect data and throw it at a “black box” that will find something for you that you did not know is as ludicrous as it is wrong. Data is there to answer your questions, if placed in the proper order.

      Thanks for commenting…


Comments are closed.

%d bloggers like this: