Automated Sentiment Analysis | Social Monitoring Software

My automated sentiment analysis cannot tell if this Facebook post is positive or negative.

I have used virtually every social monitoring software – Radian6, Alterian, Spiral16, Sysomos, Lithium, and a bunch more. Although I’ve used some more than others, there’s one thing I’ve learned though after several years of experience with these social monitoring tools – automated sentiment analysis is for suckers.

If you take a social monitoring software demo, you will be told about the software’s sentiment analysis capabilities. You’ll be told that all you have to do is click a few buttons and your software will be able to regurgitate a report that spells out whether the conversations that are taking place about your brand throughout social channels are positive or negative. Even better, you’ll be able to see how this trends at a glance from month to month.

So you’ll sign a contract, beaming with pride as you regale your boss with tales of your new tool’s capabilities. You will pay that monthly invoice with a smile on your face, reflecting on the beautiful pie charts you’ve presented to your executive team.

And one day, when you’ve got some time on your hands, you’ll dig into the data and see how these tools actually measure sentiment. And once you realize you’ve been passing on woefully inaccurate reports to high-ranking officials for several months, you’ll feel something like this:

Automated Sentiment Analysis | Sentiment Reports | Social Media

I have been giving automated sentiment analysis reports to management and I’m just now checking the data.

 

Although automated sentiment analysis is alluring, the problem is that social media software just doesn’t have the capability of parsing out emotion from words alone. For the same reasons that Commander Data’s Sisyphean quest for humanity was a compelling storyline for seven seasons of Star Trek: The Next Generation, automated sentiment analysis tools can’t cash the checks written by many social monitoring software sales reps.

Here’s a quick example. After running a quick query and configuring sentiment analysis for “iPhone”, my primary social monitoring tool told me the following post was positive:

 

Automated Sentiment Analysis iPhone Social Monitoring

Rated as “positive” about iPhones by social monitoring software

 

Conversely, my social monitoring tool rated the following post as negative with respect to iPhone:

Automated Sentiment Analysis | Social Monitoring Software | iPhone

Rated as “negative” about iPhones by social monitoring software

 

You can see the problem here. Now, I’ve cherry-picked my examples – most aren’t this egregious. But when you extrapolate this over a large sample, this starts to create some serious problems that really challenge the viability of automated sentiment analysis.

Now in fairness, most social monitoring sales reps will readily admit that their software is not 100 percent accurate (whether they’ll volunteer that information without you asking for it is another question entirely). Most reps will tell you that their automated sentiment analysis is about 70 percent accurate, and they’ll generally say that’s about the ceiling any manufacturer can hope to achieve.

Even with that deficiency, the allure of automated sentiment analysis is still seductive. Most people’s first instinct is to say “70 percent accuracy? That’s close enough, I can live with that.” The problem is that once you start to do the math, even 30 percent inaccuracy starts to cause some significant problems from an analysis standpoint.

Consider this scenario. Let’s say you have a brand with 40 positive posts, 40 negative posts and 20 neutral posts. This sentiment was measured by human eyes, and can be thought of as 100 percent accurate.

Automated Sentiment Analysis | Social Monitoring Software | Positive, Negative, Neutral

100 mentions: 40% negative, 20% neutral, 40% positive

Now, let’s assume that 30 percent have been graded incorrectly:

Automated sentiment analysis | Positive and negative mentions | Social Media

70 mentions were correctly analyzed by automated sentiment analysis – 30 mentions were not

Assume that the thirty mentions outlined by the red box will be graded incorrectly by automated sentiment analysis – an even distribution. If we know that the 12 incorrectly graded positive mentions are not actually positive, we might assume that six of those 12 are actually negative and six of those 12 are actually neutral. If you make similar assumptions about the incorrectly scored negative and neutral mentions, you might project the distribution of automated sentiment analysis with 30 percent inaccuracy to look something like this:

Social Media Sentiment Analysis | Social Monitoring

Automated sentiment analysis underreports negative mentions (37%), underreports positive mentions (37%) and overreports positive mentions (26%)

Okay, so your positive and negative distribution isn’t off by much. Neutral is still more overreported than I’d like, but I’d still consider that tolerable. But this is a best-case distribution. Although extreme, what if the 30 percent of incorrectly analyzed mentions all fell within the same sentiment band?

Social media | Social Monitoring Software | Automated Sentiment Analysis

Automated sentiment analysis will process 30% of mentions incorrectly – what if these are the mentions that are graded wrong?

You can’t necessarily assume that incorrectly graded mentions will be evenly distributed. We know that 30 mentions will be graded incorrectly. What happens if the 30 incorrectly graded mentions were all truly negative? If 30 of the 40 actual negative mentions are not graded as negative, the software will tell you that only 10 percent of mentions were negative:

Sentiment Analysis | Social Media | Automated

Negative mentions are underreported substantially (10%), neutral mentions are overreported substantially (35%) and positive mentions are overreported substantially (55%)

Of course, this could just as easily swing the other way as well. There’s nothing to say that of the 100 mentions, the 30 that were graded incorrectly all should have been marked positive, and subsequently, positive mentions are all significantly underreported.

So depending on the severity of the error, let’s assume that 30 percent of mentions will be graded inaccurately:

  • If 40 percent of mentions are actually positive, given potential 30 percent inaccuracy, automated sentiment analysis could credibly say anywhere between 10 and 55 percent of mentions are positive. Pretty big swing!
  • If you extrapolate the 40-20-40 scenario and take as a given that sentiment analysis will be wrong 30 percent your tool might spit out any of the below distributions:
Sentiment Analysis Distribution - Social Monitoring

If the middle column is the actual distribution, depending on which mentions were incorrectly graded, underreporting or overreporting of sentiment can be quite extreme

And of course, in each of these scenarios, we’re just taking it as a given that 30 percent of queries are graded inaccurately. What if for the query you run, only 20 percent are inaccurate? What if 10 percent are inaccurate? Fifty percent?

I can live with an inaccurate tool. The problem is, it needs to be consistently inaccurate. For instance, Google Analytics is well known to under-report, somewhere in the vicinity of 10-20 percent. The thing is, it never (assuming it’s installed properly) over-reports. If you understand how it works, it can’t possibly over-report if it’s installed properly. This means you can still make educated decisions because as long as the under-reporting is consistent, you can still draw insights from trends – movement in data.

Because there’s no consistency to the inaccuracy of automated sentiment analysis, you can’t be confident that the trends in data are even accurate. If you can’t be confident that the trends are accurate, well, you’ve got yourself a pretty useless tool from an analysis standpoint.

Automated Sentiment Analysis - Useless

The utility of most automated sentiment analysis tools

So here’s the thing…..trends in brand sentiment is a useful thing to measure. But the reason everybody wants automated sentiment analysis to work is that manually computing sentiment is hard. The biggest thing is that it takes a lot of time, particularly if there’s a lot of mentions. If it was accurate, automated sentiment analysis would be awesome.

The trick to producing a sentiment analysis metric is to have a process that’s manageable and replicable. Here’s how I do it.

How I perform social media sentiment analysis

I’m going to assume your social monitoring tool picks up public-facing Facebook mentions. If it doesn’t, it’s not quite as easy, but it’s still doable.

Set your social monitoring tool to return only Facebook mentions. Because of the fact that everybody uses it, a Facebook mention is more likely to be an actual human being than any other medium you can monitor. Nobody cares what a spambot says about your brand because, well, nobody listens to spambots. They listen to humans.

Depending on how big the term you’re monitoring is, you’re going to need a sample size. I think 200 is more than enough to constitute a representative sample. For any given date range 100 is the minimum I feel I can work with to get an accurate gauge of sentiment levels.

If you’ve exported 200 Facebook mentions, just start counting them. Count negative, neutral and positive with tally marks.

If it sounds tedious, remember, you only really need 200. Even if it takes you ten seconds to read a mention and make a line with your pencil in the appropriate box, which seems lengthy, it will only take you a half hour to do. If it takes you 2-3 seconds per mention, realistically, it should take you 5-10 minutes total. If you’re so inclined (as I often am), this task is ideal for interns or junior staff, particularly those who need to bolster their critical thinking skills.

Count them up and, voilà, you have an accurate sentiment analysis metric you can present to your management team with your head held high.

 

Accurate sentiment analysis

My sentiment analysis report is accurate, and I feel great!

 

So what if you don’t have 100 Facebook mentions for the period you want to measure, whether that’s a week or a month? Or what if your tool doesn’t pick up public-facing Facebook posts?

It’s a little more of a pain then, but if you have to, just do the same thing I just described here, but either add in or substitute Twitter mentions.

The thing you have to be careful of with Twitter is that there’s a lot more accounts and Tweets that are not produced by human beings. You’re not interested in Tweets that aren’t coming from human beings. When I extend this process to Twitter, I don’t count mentions that aren’t coming from the accounts of human beings. If it came from a bot, a spammer, even a news organization or another brand, I don’t count it. Mentions that come from one of those sources, even if they have sentiment, don’t accurately reflect the sentiment of your average Joe. Learning the sentiment of your average Joe is why we’re even undertaking this work in the first place.

Unfortunately, you’re introducing more volatility into the equation when you bring in Twitter because in addition to needing to accurately judge sentiment (a process that isn’t as cut and dry as you might think), you need to judge whether something came from a human being, which will fuzz up the results.

As a result, if you end up measuring Twitter mentions for sentiment, you’re just going to have to keep going until you’ve got a sample of mentions from at least 100 human beings. That’s going to take a little longer, but hey, anything worth having doesn’t come easy!

Whatever you do, do not measure the sentiment of anything classified as a blog, a forum entry or a piece of mainstream news:

  • Despite what zillions of infographics and gurus have told you, with extremely, extremely few exceptions, very few people are blogging about any one specific brand. I’ve run monitoring for big brands, and for every one blog post I come across with sentiment, I’ve generally found about 200-400 Twitter and Facebook posts from human beings that have sentiment. The problem with measuring blogs is that almost everything most social monitoring tools that’s classified as a “blog” wasn’t created by human being, and as such, attempting to ascribe any sentiment into it is a worthless exercise.
  • Monitoring mainstream news is a noble pursuit, but not with respect to this type of sentiment analysis. Obviously, when the New York Times trashes your brand, there’s major implications, but it’s almost separate from the purpose of this work. A negative mention in a mainstream news article weighs more than a negative mention on Facebook, sometimes by seismic magnitudes, and this work is hard enough without introducing authority into the equation. For the purpose of gathering the sentiment of your average Joe, ignore mainstream news.
  • The problem with forums is that their user base often isn’t totally representative of your average consumer. On a case-by-case basis, that might be advantageous – obviously a car manufacturer is going to care what people say in the comments of Jalopnik. In general however, the mentions you’ll pick up are skewed towards the niche community that populates the forum — not ideal. Plus, when you monitor forums, you’re likely to get multiple mentions from the same user, which further wreaks havoc on getting an accurate bead on what it is you’re really trying to measure.

What do you think? Am I grossly wrong? I could be. I frequently am. Did I screw up my math? I’d love to learn. Let me know your thoughts!

5 Responses to Automated Sentiment Analysis is For Suckers

  1. Rudy says:

    Dude! Well said… and I actually sell text/sentiment analysis software! BUT… Ascribe also heavily recommends that the results be READ! At least in part. I BEG my Customers to read and select the meaningful content. Most of Ascribe is about actually coding and retaining the more meaningful (accurate) content to act as dignostic content for the final business decision. The point is not to simply rely on the initial text/sentiment analysis out of some black box solution. One is meant to look under the covers to separate the wheat from the chaff. Ascribe is all about aiding that process.

    Remaining sceptical, but remaining indeed,
    Rudy
    rudy.bublitz@goascribe.com

  2. kdpaine says:

    This is great, but you haven’t even taken into acccount the fact that most of the data that these systems are collecting is garbage — spam, pay per click, and/or talking about “I need a visa” not “Visa debit card” All it takes is a bad search string or two and your accuracy levels drop even further

  3. [...] likely to be the knee-jerk sentiments of the highly reactive and skewed Twitter audience. In fact, many have challenged the accuracy of the sentiments expressed in social media mentions, and that is no less true about [...]

  4. [...] One question worth pondering about is the issue of sentiment analysis accuracy. To date, some 4-5 years++ after sentiment analysis became active in mainstream businesses, the accuracy remains less than 100%. Much less in fact. Some recent writers suggest even 70% accuracy is considered an optimistic ratio, and that is the realistic ceiling. [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>