Data,data,data everywhere and what do I do with it. How do I make sense of it that is useful to the business? More importantly, how do I tell the story now that the solution is built? VISUALIZATION. Virtually any software package today has some kind of visualization, even rudimentary tools such as Excel. But the packages differ in their complexity of how to deal with the data. By complexity, I mean the ability of the tool to provide increased interactivity with the data. Essentially, the software provides the business user with more control over the data and how they want to view it. The second key feature of these tools is simplicity and its level of intuitiveness. Can the user easily navigate their way through the package in order to achieve their desired visualization objective? There is no question that a series of charts and graphs linked together in an effective manner provide a very good framework in “telling the story”. But how are these linkages determined and what about the data underlying each of these charts or graphs.
Too often the general “business” public is bombarded with promotions and ads that visualization tools are the panacea for “telling the analytics story”. Visualization is often presented as a means of overcoming the insurmountable communication barriers that exist between the data scientist and the business person. These typically common prejudices exist with business people belaboring the fact that the data scientist does not understand the business problem while data scientists clamor about the lack of attention to “math” and “data” being an inhibiting factor in the creation of effective solutions. With visualization, software vendors present their products as a means of “putting a picture” on the data science solutions. But this is easier said than done. The real work is the data lying beneath these visualization tools. The data must still be “worked” in order to provide that visualization capability. In order to obtain a better appreciation of this, let’s look at the construction industry where over 90% of the house’s real value arises from the right engineering and architectural processes. Yet, it is the 10% or less which represent the “finish” of the house (i.e. painting, dry wall,etc.) or how the house visually appears. In a way, the so-called “finish” of the house is akin to the “ visualization” of data. In construction, the finish of the house is irrelevant without the right engineering and architectural principles that are used in building the house. In analytics, visualization is meaningless without the data foundation or to put it in construction terms without any “engineering” or “architecture” of the data. But the key in building this “data” foundation is the analytical file which is the core responsibility of the data scientist.
The building of this foundation, of course, commences with the business problem that needs to be solved. He or she may not have a specific solution in mind but would have an understanding of the key data elements that are potential inputs to solving this problem. With these relevant data elements, the data scientist as well the business analyst can explore the data in more detail in order to determine those insights that might be useful in solving the problem. At this point, these insights can then be turned into a “picture” or visualization to tell the story but where data is the foundation.
In today’s Big Data world, though, some organizations seek to identify problems through what is called a data discovery type approach. Instead of just using data to solve a specific problem or challenge, organizations also use data in identifying upfront problems. As visualization technology has advanced, raw data such as web logfiles, transaction files,etc. can be analyzed without any data prep or manipulation done by the data scientist. For example, at a very basic level, Google Analytics provides nice graphical trends overtime to give insight on what days of the month yielded the most activity. Further drill-down can provide the type of pages that generated the most interest during this time. Yet, although Google continues to increase its dominance of analytics within the digital sphere, other tools are required when looking at non web data.
Even within our increasing digital and Big Data world, optimizing customer value is still a core mission of all organizations where CRM programs and analytics are the tools used to achieve this goal. Here the data needs to be worked in order to create that “one view “of the customer. This requires that the data scientist manipulate and organize the information in order to present this view. However, before the data scientist even performs this kind of work, some type of “forensic” analytics can be done on the raw data. Why would this be done. Once again, it’s all about learning. Is there something inherent in the raw data that the analyst needs to understand before even creating this “one view” of the customer. Let’s take the example of sales or transaction data. Scatter plots can be easily displayed against the raw data. Similar to what we might observe in Google Analytics, our first view might be to look at transaction records over time. Once again, we are trying to observe out of pattern behavior. Are there certain time periods when we see unusual behavior. We can then overlay this data with transaction amount to see if there is something unusual with transaction amount during this time period. A third overlay might be transaction type in order to identify specific products and /or services that might be out of pattern during this specific time period . Many of the more advanced tools will allow the analyst to conduct statistical analysis to see if these results and findings are indeed statistically significant given the out of pattern view that is visually presented to the analyst. This initial view of the raw data could yield findings that the data scientist incorporates when building this one view of the customer. For example, a given two-week period in a certain geographic market promoted a certain product with a steep discount to its most valuable customers. Given that this was a one-time event, some adjustment would need to be made to reflect this extraordinary behavior. Perhaps this time period is excluded when conducting any kind of analytics on this group of customers. In any event, exploration of this raw data through visualization identified issues that the data scientist needs to account for in any future analysis.
In most cases, though, the analysis of data requires some level of customization or data manipulation with the ultimate deliverable being some kind of analytical file that will be used for visualization. Typically, these files ,often referred to as pivot tables , represent a summarized view of the data. These pivot tables offer tremendous flexibility in allowing the user to create many different types of reports. The key, though, in building these reports , is to understand the basic concepts of business analysis. In other words, what are we measuring and how do we want to view these measurements(dimensions). Identifying what is to be measured and how we might want to view these measures is critical to the creation of any pivot able. The use of columnar and database compression technologies by firms within the business intelligence industry has yielded better tools to facilitate the development of a given solution. The increased granularity in terms of measures and dimensions of these data exploration tools allows the analyst to look at a much wider variety of options in attempting to solve a given business problem.
Data Visualization is not a solution but simply one of the many tools within the analyst’s arsenal or toolkit. With the right data visualization tools, more thought can then be spent on solving the problem and what data do I need to use to potentially solve this problem. Effective use of visualization is meaningless without the right data foundation.. Having the right data foundation, visualization can then be used to truly optimize the “story telling process” of analytics.