Where will big data make a big difference?

by User Not Found | October 31, 2012

I have no doubt that big data will have a big impact in many areas including market research, but readers of this blog will know that I am skeptical about many of the claims made about the utility of big data. So a Harvard Business Review (HBR) blog post titled “Big Data Hype (and Reality)” was bound to get my attention.

Gregory Piatetsky-Shapiro presents some interesting examples to support his case. For instance, he discusses the Netflix challenge as a case where big data analysis failed to improve significantly on the existing ability to predict user preferences. You may remember that Netflix offered a $1 million prize to anyone who could improve the accuracy of recommendations over its existing algorithm by 10 percent. It took 3 years for a team to win the prize and the prediction process proved so complex that it was never implemented by Netflix. Besides, a 10 percent improvement only represents a 0.1 star improvement in predicting someone’s movie preference.

Apart from anything else, this helps explain why I never saw an improvement in the system’s ability to recommend new movies for me to watch (compounded no doubt by the fact that my wife and I have very different tastes when it comes to movies, but we share an account). It seems to me that one of the fundamental issues is that the system assumes that genre is the overriding reason why people might like a movie. Maybe the prediction could be improved if people got to classify the movies by factors other than overall genre, and not just rate them. For instance, I like science fiction movies but only if they are intelligent, thought provoking and well-crafted. 

So where does Piatetsky-Shapiro suggest big data will have its biggest impact? Artificial intelligence. He points to IBM’s Watson and Apple’s Siri as indications of things to come. (Why am I reminded of HAL from 2001 Space Odyssey, GERTY from Moon, and David from Prometheus?). Piatetsky-Shapiro also nominates individual healthcare and location-based analytics as areas where big data will prove important. He suggests that the success of social networks such as Facebook, Twitter, and LinkedIn depends on their scale, and that big data tools and analytics will be required for them to exploit that scale effectively.

In spite of my skepticism about many of the claims made about big data, I can’t help feeling that Piatetsky-Shapiro is too pessimistic when he suggests that the randomness inherent in human behavior is the limiting factor to consumer modeling success. He states:

Marginal gains can perhaps be made thanks to big data, but breakthroughs will be elusive as long as human behavior remains inconsistent, impulsive, dynamic, and subtle. 

True cause of people’s behavior may not be immediately apparent, and without asking people questions about why they behave the way they do, you are entirely reliant on the researcher’s interpretational ability. Maybe what is needed, as I have suggested in the past, is the marriage of big and little data? 

A major advantage of traditional questioning techniques is that they elicit responses that might not naturally occur to the respondent, but which we know are important determinants of behavior.

So what are your thoughts on the value of big data? Does it offer the ability to understand why people do what they do? Where do you think it will have the most influence? Please share your thoughts. 


Leave a comment
  1. Joshua, December 07, 2012

    Hi Nigel - interesting post and I agree with you that our ability to predict is still quite poor in many contexts.  However, I would posit that our failure in prediction is not a failure in the Data, but rather a failure to broaden our view of what sets are acceptable to work with in a given context and which aren't. To illustrate this point, allow me to reference a case study from very interesting book called the Power of habit by Charles Duhigg.  In this book Duhigg discusses myriad topics and their relationships to creating behavioral habits in individuals, organizations and societies, including Target's (the big retailer) successful pregnancy predictive analytics.  Duhigg was good enough to post to the NYT what is essentially a cliffsnotes version of his book which I have excerpted below.  In a nutshell, the Target case study examines how the head of their analytics was able to crack the predictive problem  of identifying pregnant women early in their terms by combining sets of data about otherwise non-obviously related products that, *when taken together in sequence*, do dramatically increase the chance of correctly identifying that a woman is indeed in the early stages of pregnancy.  This is where the power of BigData and prediction really lie – moving beyond the sets of data we think should be obviously related and developing the skills to hone in on the context data exists within, and then the key sets that are actually useful because of that particular contexts.  In other words, being self aware enough to choose to refine our hypotheses as we discover the biases you talked about in your other post have skewed our interpretive/predictive abilities.

    I would also add that Nate silver's work, as mentioned above by Al, is a related, though not exactly matching conceptual example. Silver’s approach of combining a weighted score of several different state polls (while intentionally excluding national polls) in order to arrive at a highly correlated bayesian prediction of the aggregate results in state elections.  The key is not to combine all known data points. The useful key data set consists of state polling data that has been weighted to control for known skew and other factors. And as Al said, the prediction is generally right for the group as a whole which helps mitigate some of the concerns you raise in the marrying big an little data piece.  Read more by Silver at the NYT fivethirtyeightblog (http://fivethirtyeight.blogs.nytimes.com/).  Silver's chapter on the systemic failure of key leaders to predict the mortgage backed securities induced global financial crisis is also quite interesting/informative on the assumption and bias piece. (http://www.rakethru.com/prices/isbn/9781594204111/The-Signal-and-the-Noise:-Why-So-Many-Predictions-Fail-but-Some-Don't/ALL/)

    Duhigg POWER OF HABIT excerpt http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?adxnnl=1&pagewanted=all&adxnnlx=1354860078-Xm8Y/aOlg4rVLp2od6TzAw&_r=0

    "…The only problem is that identifying pregnant customers is harder than it sounds. Target has a baby-shower registry, and Pole started there, observing how shopping habits changed as a woman approached her due date, which women on the registry had willingly disclosed. He ran test after test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.

    As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.

    One Target employee I spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August. What’s more, because of the data attached to her Guest ID number, Target knows how to trigger Jenny’s habits. They know that if she receives a coupon via e-mail, it will most likely cue her to buy online. They know that if she receives an ad in the mail on Friday, she frequently uses it on a weekend trip to the store. And they know that if they reward her with a printed receipt that entitles her to a free cup of Starbucks coffee, she’ll use it when she comes back again…"

  2. Al, November 07, 2012
    I agree with Nigel's point about marrying big data and little data. However I think there's another crucial point here: surely not even big data evangelists suggest that data can predict one individual's decisions or likes. All we can expect it do is be more right, more often, about the trends of mass populations. Individual humans are random, emotional, inconsistent and impulsive, but humankind in general is mathematically predictable in many ways.
    Nate Silver never claimed to be able to predict how one swing voter would vote, but he has predicted accurately the electoral college victories in 49 out of 50 states, so far, based on application of big data.
  3. Nigel, November 06, 2012
    Thanks for the comments Nick and Ed. The more I think about this the more I believe the key issue is less "what can you learn?" and more "what can you do?" In other words, most cases I have read about seem to involve "intercessions" to change the predicted behavior, usually in the form of offering discounts or special offers. That is a pretty expensive and potentially dangerous form of marketing, i.e. trains people to game the system to their advantage. Has anyone come across examples that have informed brand positioning or campaign ideas?
  4. Ed C, November 02, 2012
    I think this begs the question: "is human behavior formulaic?" or better yet: "does free will really exist?" I'm a believer that our past experiences do shape our future behavior, but will anyone ever really have a grasp on ALL those variables enough to predict everything? Big data can help and probably will in many fields though I think it will be a huge undertaking to make a dent and that there must be some financial benefit to the company doing this type of research to even start the process. It was cool IBM created Watson though I'm not sure how much money they made from it.
  5. Nick, November 02, 2012
    Will big data ever truly understand why people do what they do? The correct answer is no, but it can definitely get you pretty close. The beauty about being human is our unpredictbility. The Netflix 10% example is great, but don't forget the big data precursor which got Netflix's recommendation algo up to a superior level. Though I'm usually up for any one of the Matrix trology films, my tastes could skew at any moment and even if Netflix got to a workable model for the +10% increase, they could still never predict what film I'm going to choose.

    Leave a comment