Beyond the bias in big data

by Nigel Hollis | May 15, 2013

Kate Crawford, a principal researcher at Microsoft Research and a visiting professor at the MIT Center for Civic Media, has written a provocative post on the HBR Blog titled, “The Hidden Biases in Big Data.” She quotes former Wired Editor-In-Chief, Chris Anderson, as saying, “with enough data, the numbers speak for themselves." Crawford then asks, can numbers actually speak for themselves?

Crawford’s answer is a simple no. She states:

Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.

I agree. Data – big or small – can no more speak for itself than a goldfish. Big data just makes a long standing problem… bigger. Data must be cleaned and ordered before it can be used, and what numbers mean depends on how we interpret them. I also agree that what we really need is not big data but, to use Crawford’s term, data with depth. This is what I was trying to get at in my post about big data needing a little help. 

Chatting to my colleague Bill Pink, Senior Partner, Creative Analytics at Millward Brown North America, he suggests that making use of big data, or any data for that matter, comes back to first principles:

What question are we trying to answer? Do we understand the people, psychology, human relationships, the category or phenomena under study? The upside of the big data is we now have previously untapped assets to help us answer these questions – mobile collection of texts, social media, set top data on TV viewing… that’s the amazing thing. 

And those new data assets can be used to provide a better explanation than if we did not have those data sets to include in the story. But that assumes a framework, analytic approach and tools to evaluate and integrate the data and reach these conclusions. It’s not the presence of the data that matters, it’s the question to be answered and the ability of the new data to take us to further than we were before.

Bill continues:

One of the ironies of the buzz around big data is that the folks who are saying the loudest to keep in mind curiosity, experience and the principles of research… are often the most technically savvy data science types.

To back up his case, Bill points us to a New York Times article titled, “Sure, Big Data Is Great. But So Is Intuition.” The article is worth a read and I love this quote from Claudia Perlich, Chief Scientist at Media6Degrees: “You can fool yourself with data like you can’t with anything else.” All too true I fear. What do you think? Please share your thoughts. 

5 comments

Leave a comment
  1. David Candy, May 16, 2013
    There was statistical research done on rating horses by lawyers who turned to betting (they were known as the legal eagles). The found taking into account any more than a few factors rendered the ratings useless.

    I wonder if that's true here too or perhaps alogarithms have advanced. If they have, have they raised the limit or removed it.

    In my hobby my frequent google's searches generally return useless results. Yet returns excellent results for my rare searches on pizza, train timetables, etc.

    In my hobby google knows I search on computer problems a lot (they're other peoples). It returns page after page of technically illiterate people with the same problem.

    On my rare searches I suspect it uses my location to match and nothing else.

    The less it knows about me the better the result.

    That's my theory anyway.
  2. erik du plessis, May 16, 2013
    It all comes back to "Where is the wisdom?"

    Big data, heavy statistical analysis, etc. is all OK.

    At the end of the day it is all about the wisdom that sets the hypotheses to be tested by big data and the interpretation against reasonable models. This is part of what is called Wisdom. 

    Wisdom can come from small data.Often Wisdom is what sets up the hypotheses that can then be analysed by big data. Mostly Big Data is meaningless without wisdom.

  3. Eileen Campbell, May 15, 2013
    I also worry about the "great equalizer" of big data.  Once information has been warehoused and made available in a database, we begin to think of it as all being comparable.  Our friend Bill Pink constantly reminds us that perhaps the most important of the big data "V"s is veracity. Data is not democratic.  It is not all created equal. As we give data its voice we're well served to remember this.  Otherwise we risk becoming myth-builders rather than story tellers!
  4. Tim, May 15, 2013
    Absolutely.

    The idea that we can distill truth with just a huge amount of data entirely ignores that we need frameworks, models, and theories to determine what data to collect about what topics and from whom. We can no more capture the totality of data to subject it to analytic techniques than we can rebuild a city to scale to serve as a map.

    See also the Pig Data fable:
    http://scensci.wordpress.com/2012/12/14/big-data-or-pig-data/
  5. Ed C, May 15, 2013
    While I'm a fan of data, the more I hear about Big Data the more I think about the Tower of Babel. Just because we now have something with more "bricks," will we really get closer to the "truth?" Sure, we've learned things over the years and I'm a fan of progress, but I'd say let's not get carried away in thinking that just because we have more bytes of data we're proportionately more intelligent.

    Leave a comment

    Share