Text Analytics: Be Aware of Magic Bullets


By Gord Ripley, Senior Vice President Operations at Verde Group

There are many text analytics platforms available on the market today. In essence, they all claim to take unstructured data (verbatim) and turn it into structured insights. Terms like machine learning, artificial intelligence, natural language processing and sophisticated algorithms are being thrown around, but are they the magic bullets they claim to be?

Over the years, I have had a lot of experience with these platforms, using them daily, as well as countless demos from sales reps as our company continues to search for that elusive bullet.  Here is what I have found both through our search as well as real-life experience with these platforms.

For starters and to set the stage, we are a Market Research company. In practice, we specialize in quantifying the financial risk to a company based on the problems that customers are experiencing. In a nutshell, we take data seriously.  Therefore, the precision we require in our research is foremost to us and our clients.

The general product proposition for text analytics platforms is to load a large quantity of text into the platform then sit back and let the tool do all the work. The output should be tagged (coded) phrases/ words/concepts with a level of sentiment attached. Let’s break that down:



The text analytics platform analyzes a sentence or paragraph and “codes” or tags the verbatim to quantify what is being said. This is the area where I see the most problems with these platforms. They simply are not as efficient when it comes to properly tagging or coding a verbatim comment.

The issue is, the platform looks for keywords and tags them to a concept. However, many times the intended tag ended up being something completely different. I have seen countless examples of this in my real-life experiences with a top platform and even the sales demos. Look closely during your next sales demo and you will see the misclassifications. If you use a platform currently, have a look at the tags to ensure that they are properly being classified.

Tagging to concepts is critical to the conclusions drawn and therefore it is incredibly important that this step be done correctly. Otherwise, we could be misrepresenting the issues that customers are having.



I have found that these platforms perform well when it comes to properly classifying sentiment. I would suggest that you keep the sentiment simple (positive/negative/neutral) versus trying to distinguish levels of positive or negative. Of course, if the tags above are wrong, is the sentiment being applied to the correct tag or theme?

Let the platform do the work


Not so fast. These platforms generally require a significant amount of manual user manipulation to get things right, or at least more correct than the platform’s original output. For example, it doesn’t matter how much you pay for a set of golf clubs because your swing is still your swing.

So, prepare yourself and/or your team for hours of manipulation of the tags/codes.

For a tracking project, this time investment may be worthwhile in the long run, but for a one-off project, It may not be worth the effort compared to manual coding of verbatim comments by skilled human coders (who can also apply sentiment).


I hope this has not come off as too negative, as I believe these platforms will continue to evolve over time. However, we cannot forget that human language and how we express ourselves tends to be very complicated, especially in industries that have unique lexicons. So, if you decide to pick a platform, just be aware, there are no magic bullets.