Technologies and Big Data continue to bombard our working lives. Discussions and debates continue to reinforce the fact that change is inevitable. No one disputes this premise as change is really the one constant in life. But the difference in today’s current climate of change is the argument that the old approach and disciplines of conducting certain business activities no longer apply. Certain pundits would have us believe that we are entering a new age of information and technology and much of what was applicable in the past is no longer applicable today. Even our world of predictive analytics and data science is “fair” game. Like most practitioners, my background and experience has always been grounded by the science and math. Objectivity has always been the underlying culture and philosophy for predictive analytics practitioners. So I am indeed fascinated when an opinion emerges that the core disciplines of data science and data mining will be revolutionized. Because of “Big Data” and the need for speed, the four step approach of Problem Identification, Creating the Analytical Environment, Application of Data Mining Techniques, and Implementation/Tracking no longer make sense. In fact, SAS has adopted a similar phased-in approach (SEM MA-Sample,Explore,Modify,Model,Assess) once the business problem has been identified. This is not too dissimilar from the CRISP method which is about business understanding, data understanding,data preparation,modeling, and deployment. SEMMA, CRISP, and the four step approach discussed above all have more or less similar methodologies. Yet, the Big Data discussion has resulted in comments from certain opinion leaders that automation is the answer for everything and that the attention to the above-mentioned disciplined approaches is less relevant in today’s current environment. As a data scientist for many years, I do not want to sound like a Luddite but as with most aspects of life, it is about balance.
Technology continues to evolve in how we consume and analyze Big Data. Software and hardware providers continue to provide tools that increasingly ope-rationalize and automate these solutions. One question that often arises is whether or not artificial intelligence solutions will eventually supplant the need for predictive analytics practitioners. If we think about automation, then there is a need for standardization in order for the computer to effectively react to its inputs. If the inputs represent a standard process such as running routines related to correlation analysis, factor analysis, exploratory data analysis reports, logistic regression and validation decile reports with the goal being a predictive model solution, then this type of process can be automated. But predictive modelling and predictive analytics goes well beyond this standard process since predictive analytics is ultimately about building a solution to solve a problem . Yet, the very nature of business problems is their uniqueness which essentially require an analytics and information environment that is best equipped to solve the given problem. The variability of business problems and their solutions require flexibility and adaptability which is the unique advantage of the human brain over the computer. Let’s consider this in more practical terms. The data inputs and their subsequent derived variables are going to be drastically different in building a predictive model for bank retention than for an insurance pricing model. Even within the same domain such as a bank, the analytical file used to build a customer retention model will be very different than building a credit risk model for these same bank customers.
Automation makes sense when the path to a given solution is quite clear. From the discussion above, we know which elements can and cannot b e automated. Mathematical routines and reports can be automated but it is the use of data in defining the analytical environment that will vary from problem to problem and that needs to be determined by the practitioner.
Besides data, interpretation of output is another area that is more advantageous for the human brain than the computer. For example, a statistical analysis always reveal that new customers are more likely to cancel which is expected since strong discounts are used to acquire these customers. Now the company is eliminating these discounts, which would indicate to the human being but not to the computer that new customers should not be used in any customer retention models. Another great example is insurance response models where the insurance product itself is age-sensitive. As expected, age was the strongest predictor of response, yet age was missing for 50% of the customer base. Therefore solutions had to be developed on two customer segments(one with age and one where age was missing). Besides separate customer segments based on reported age, models were built to predict age on customers where no age was present. One can surmise that the predictive analytics strategy for this company and their product was certainly determined by the practitioner and not the computer.
What about implementation and tracking? One would think that once the solution is developed, implementation and tracking requires a more automated approach. To some extent this is true, but in practice, the many problems faced by organizations require a much more adaptive and flexible approach. Two primary goals need to be achieved when implementing and tracking results:
Implementation is never straightforward as business needs evolve and new ones emerge. The analyst and not the computer is going to assess the current relevance of a model under a certain scenario. Let’s look at a couple of typical scenarios. Should a model that was built regionally in California be applied nationally? Should models built specifically for new customers all be rebuilt since a completely revamped customer acquisition strategy was introduced last year? The answers to these questions require further analytics. The job of the practitioner is to determine the methodology or approach in answering the question while the automated tools such as certain reports or mathematical routines facilitate the practitioner’s efforts in obtaining these answers.
In designing a measurement and tracking scheme for a given model, this schema will change depending on the business problem. For example, I may want to track risk models over certain areas of the country due to domain knowledge which reveals that there are high-credit risk areas vs. low credit risk areas. At the same time, I may want to track specific segments or cohort groups of customers and see the performance of these models against these different risk groups. Does the risk model erode quicker overtime with certain segments than with other segments. Meanwhile for a given retail customer retention model, the organization may want to look at how these models perform within their stores. Once again, the resulting insights could be used to identify high-risk vs. low risk stores based on retention model scores
The above discussion just focuses on predictive analytics and does not look at other solutions that might be built by the data scientist such as a data discovery where the outcome is a data-driven business strategy. Technology will continue to advance and practitioners should always be open-minded on improving the effectiveness of our work efforts through automation. But the human being’s ability to adapt and apply his or her knowledge to a unique business situation can never be automated. Ultimately, it is about using technology to automate the right components so that the analyst can apply their skills and expertise towards solving more of these business problems .