Contact Us

Making sense of the Twitter fire hose

by Nigel Hollis | November 29, 2012

The data accumulating in databases today offer amazing opportunities for new insight. However, two things worry me: the problem of accepting consumer commentary at face value and the problem of garbage in, garbage out. Both of these issues have to be dealt with, particularly when analyzing social commentary. 

Millward Brown’s foray into this arena is to mine the entire Twitter stream to identify serial brand commentators, and then parse their tweets to summarize subject, tone, location and sphere of influence. Implemented by the Emerging Media Lab in the U.S., the resulting Verve Index provides companies with a way to compare and contrast their standing in the world of social media.

I was surprised, given the absolute quantity of data involved, that data handling is not the major issue. By renting bandwidth from Amazon’s Elastic Compute Cloud (Amazon EC2), the team can quickly scale to handle changes in volume around major events like the Super Bowl. 

Far more of an issue is the amount of spam found in the Twitter stream. Though social media measurement is widely accepted as an accurate census of organic online conversation, much of what is being captured is actually mass-produced by non-human sources.

As Anne Czernek describes in this Point of View on the topic, a recent Millward Brown audit of Twitter data across about 60 brands found that as much as 60 percent of all brand data captured is spam. While most of it is fairly innocuous – the “click here to get a coupon” variety – it does change the overall tenor of the commentary. All of which means that while Natural Language Processing and applying Bayesian rules help with data cleaning, at some point a human needs to step in and judge whether the commentary is worth attending to or not. 

Analyzing social commentary provides a rich source of stimulus to prompt ideas and hypotheses for further analysis. This is particularly true when analyzed in parallel to traditional survey data sets, to understand the interaction with other elements of the brand experience and marketing campaigns. But the value of such analysis will only be recognized if we know that the issue of garbage in, garbage out has properly been addressed.

In the early days of the Internet, I remember many people in start-ups stating something along the lines of “electrons are free.” These days it seems that the assumption is “data is free.” But even if the data is really free, there are hidden costs attached to managing and analyzing big data. Ensuring that the data being analyzed is meaningful is just one of these hidden costs. And the real hidden cost is making the wrong judgment based on the data. Without an emphasis on data quality and cleaning, the chances of incorrect assessment and interpretation become significant.

Verve is just one way to harness big data alongside traditional research techniques. How else do you believe research will adapt to the world of big data? And how big an issue do you expect data quality to be? Please share your thoughts.


Leave a comment
  1. Steve, December 05, 2012
    Big Data doesn't mean Big Insights. All of this needs context. Yes, it sounds impressive to say we analyse the Twitter Firehose, but I bet if you sampled 10% of that, you'd end up with the same story. What about blogs, forums, review sites? Twitter is a fantastic resource, but it's not the be all and end all of social media data. I'm often left with a sour taste in my mouth after all the hoopla of social media 'analysis', leaving me with a niggling feeling of "So what?" Don't get me wrong, I love this data, but just providing numbers and saying you analyse bazillions of posts doesn't impress me any more. What kind of analysis? What can I do with the numbers? How does it help me? What problem is it solving?
  2. Nigel, December 04, 2012
    Hi Martin, thanks for the comment. Seems the link you posted was broken. Folks can find Rick's post here: 
    While I am in total agreement that good methodology is required to produce good data maybe the methodology is not always a survey. Maybe it is a way to parse good data from bad.
    Besides that point I love Rick's set up, "BIG DATA!  Coming this Friday to the Scotia Bank Place, see Biiiiiiiiiiig DATA take on social media and gamification in a battle royale.  Watch as BIG DATA collides with insights and smashes heads with analytics."
  3. Martin Silcock, November 29, 2012
    If companies want a way to compare and contrast their standing in the world of social media what can they use this for?

    Accumulating data sounds great, and its fortunate that it's become cheapish...but what do you use it for with regards to real insight?

    I think its probably more useful in simply automating some parts of digital marketing and customer experiences, like in stock market trading. predictive modelling tries to learn from behaviour patterns...but that is not insight it's marketing automation. 

    Maybe MR should adapt by focus on methodology.  Ric Hobbs in a post 

    "Software is not a methodology. You do not need Big Data, you just need good data, and good data comes from good methodology."

    Leave a comment