By: Daniel Gutierrez
Originally published at http://insidebigdata.com
In insideBIGDATA‘s special guest feature, Jeff Catlin of Lexalytics lays out the case for text analytics and its importance to the rising interest in big data. Jeff is CEO of Lexalytics, a company providing sentiment and intent analysis to an array of businesses using on-premise and cloud-based technology.
Sensors, tweets, emails, web clickstreams, CRM information, supply chain tools – data is flooding into every business, and the businesses that have the most facile processes for divining actionable information from the deluge are going to be the businesses that make the most money. This data deluge is not just a problem for large enterprises. Small businesses also interact with their customers using many channels and have websites, databases and often large amounts of other data to analyze. Hence all the buzz around “big data.” But what does that phrase actually mean, and how does it apply to your business?
The term “big data” is ambiguous – when does it actually cross the line from just being “data”? The plain truth is that nobody is really sure. Some people have definitions like “when you have to use tools like machine learning” or “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” It can be more useful to define Big Data as the data necessary to make decisions that have a positive impact on a business. Think about the “bigness” not in terms of literally how much data there is, but in its potential to help make more money.
If it can be counted, it can be analyzed. If it can be analyzed, it can be interpreted. But what type of count or interpretation can be made from a voice recording of a customer service transaction? How are tweets or prose to be interpreted? What type of information can be gleaned from customer product reviews? What happens when those reviews are videos?
Unstructured data is a large part of big data. You can get a lot of information from purely structured data, things like the Click Through Rate (CTR) and conversions from an advertising campaign. But that’s not going to give you a view into what is actually being said. What is the conversation? In order to delve into the dialog, and whether it is a positive one for your business, you have to get into the unstructured side of things.
The difference between unstructured data and structured data is simple. Computers are very, very good at manipulating structured data, and a whole suite of tools has grown up around visualizing and making predictions from this data. Structured data is basically numbers at its core – how many times a page was visited, how long someone was on your site, where they came in from, what products they bought. Unstructured data are things like text (say, from a survey or from tweets), or video, or a voice recording of a customer service transaction.
One approach to analyzing the conversation between a company and its customers and partners is to apply brute force manual labor. You can read all written communication – emails, tweets, and reviews; watch the videos; listen to the audio recordings. Then you can manually convert emotional impressions and interpretations of the conversation into structured data that can be fed into business intelligence tools as a complement to the organization’s traditional data sources.
Many businesses take this approach, some more formally than others. Some just take a look at their Facebook page likes and comments, or their tweet stream, trying to “get a vibe” for whether or not things are going well. Others ignore unstructured data –they don’t understand the value of the data or the insights that can be gleaned or they believe that it is too hard for their business to use.
If structured data is big, then unstructured data is huge. The generally accepted maxim is that structured data represents only 20% of the information available to an organization. That means that 80% of all the data is in unstructured form. If businesses are gaining value from analyzing only 20% of their data, then there is a massive potential waiting to be leveraged in the analysis of unstructured data.
Unlocking this potential represents the next Big Data challenge. And for the text portion of unstructured data, the solution is text analytics. Also known as text mining or natural language processing, text analytics is the science of turning unstructured text into structured data. It has moved from university research into real-world products that can be used by any business.
Text analytics is focused on extracting key pieces of information from conversations. By understanding the language, the context, and how language is used in everyday conversations, text analytics uncovers the “who,” “where,” and “when” of the conversation, the “what” or the “buzz” of the conversation, “how” people are feeling and “why” the conversation is happening. Conversations are categorized and topics of discussion are identified.
Another key piece of information is the tone of the conversation – how people are feeling and why this conversation is happening. Uncovering this sentiment is especially difficult, and is the largest reason why man is not going to be replaced by machine any time soon.
Identifying the “who,” “what,” “when,” “where,” “why,” and the sentiment of the conversation converts unstructured data into structured data, and enables businesses to listen to all of the conversations. The structured data of the conversations can then be incorporated into businesses’ existing big data business intelligence and business analytics packages.
Complete text analytics systems are marketed as social media monitoring solutions, or as voice of customer or customer experience management solutions. Many vendors incorporate both social media and voice of customer into one package, enabling companies to listen and respond to conversations with the customer, and to mentions of the company in the wild.
A classic use case is brand management. Brands often sponsor events – the Olympics, major league sports, local marathons, or charity events. If a brand-sponsored event is poorly managed, negativity from the event can attach to the brand. Unless the company is listening to the conversations around the event, it may never know about the negative sentiment, and may not understand what caused a drop in sales.
The hospitality and restaurant industries also benefit greatly from using text analytics to listen to the conversation. Much of the customer feedback for hotels, resorts, and restaurants takes place outside of the customer-company conversation. Reviews can be placed on a plethora of websites, forcing companies to manually seek out and interpret the conversation. With automated text analytics tools, a hotel can quickly and easily assesss whether they should be spending money on new linens or pool improvements.
Text analytics can be used to develop a better understanding of the likes, dislikes and motivations of the customer. Changing loyalty program incentives to match customers’ desires can improve customer loyalty and increase sales.
There are many other examples, and the uses of text analytics to listen to the conversation are essentially limitless. And there is significant value in listening to the conversation. The conversation is immediate – people are talking in the moment they have an experience, in the moment they interact with the brand or the company. They are having conversations to try and figure out which brands they trust and want to have as part of their lives. While sales are a lagging indicator, discussions are a leading indicator.
It can seem like a challenge to keep an ear to the ground, listening to conversations about your business, competitors, customers and suppliers. But if you’re not listening, you’ll be surprised when the winds change. While sometimes the surprise is good, often it is not. And when negative conversations take place, the impact to the business can be drastic.
Fortunately, text analytics has come of age, and businesses both large and small can benefit from listening to the conversations taking place. Just recently, Facebook announced the availability of Topic Data which uses text analytics to reveal what audiences are saying on Facebook about events, brands, subjects and activities. Marketers use this information to build product roadmaps and make better decisions about their activities.
Some marketers will use off-the-shelf services to collect, analyze and visualize the unstructured data, like with social media monitoring or customer experience management systems. There are hundreds of companies serving this market, and more launching every day. Other analysts have an affinity for a particular business intelligence tool, many of which play nicely with text analytics, allowing the analyst to blend the structured and the unstructured data into a coherent story. Because of the broad availability of tools serving different niche markets at different price points, a business can start as small as they want and build from there.
Major companies have made clear moves showing the importance of text analytics. For example, IBM has been pushing their Watson platform really hard, and recently acquired AlchemyAPI to augment the analytics side of Watson. In another example, Microsoft purchased Equivio, a text analytics company focusing on eDiscovery.