In our Big Data world, software applications and programming tools continue to expand the data scientist’s toolkit in facilitating the process of building predictive analytics solutions. Open-source tools have become much more prevalent in the last few years as they enable access to this discipline to a much wider array of people thereby providing much more potential for knowledge enhancement. The academic world and young students are now focusing their intellectual energies in developing new processes and techniques that are improving all facets of our everyday life. Using data science as the foundation, new tools have evolved that allow easier flexibility in being able to function within the semi-structured and unstructured world of Big Data. Such tools as Python, R, Pig, Mahout.etc. represent just some of the newer tools in the data scientist’s arsenal along with perhaps SAS which still remains the dominant commercial data science application according to most benchmark studies.
Due to the open-source environment, libraries and modules are being created to run analytical and functional routines as well as advanced mathematical routines. This in effect is greatly accelerating the knowledge base of the community and its ability to produce solutions that are addressing the right business problem. Yet, in this all-consuming quest to build more and better solutions, minimal attention is devoted to effective measurement and validation of these solutions. Technology has empowered the data scientist but the CEO or senior executive of any organization will ask the fundamental question: “Is the solution working?” In the world of CRM and database marketing, arguably one of the pioneers in using data science and predictive analytics techniques, measurement and validation approaches have existed for decades. The most common approaches have been in the validation of models for direct marketing programs. The use of decile charts,gains charts, and AUC(area under the curve) charts help to evaluate model performance by assessing the model’s ability to rank order the observed desired behavior when customers are ranked by predicted behavior.
But let’s take a look at some of the more recent targeting advancements such as marketing optimization and net lift models. The concept of marketing optimization takes basic model validation to a new level. In marketing optimization, multiple models are used alongside business constraints with operations research as the mathematical tool providing the appropriate solutions. For example, a given customer may have ten product model scores which indicate his or likelihood to purchase a given product. Yet, the business itself may be trying to invest in certain products which currently have a low customer purchase threshold yet represent products with significant growth potential. Accordingly, business constraints might be established that promote these high potential products despite the fact that other products have higher model scores. Operations research techniques(often referred to as O/R) elegantly assigns products to customers based not only on product scores but ensuring that the constraint of promoting these high growth potential products is also achieved. But how do we validate marketing optimization within this basic framework. I actually remember asking this question at one conference wherein I received a puzzled look and a very curt response such as “It just works”. The short-term and tactical approach in evaluating a given model does not apply here. Here, the practitioner needs to look at these techniques over a much longer time-frame. A two-pronged approach would be adopted where initially each model used in the optimization algorithm would be evaluated using the traditional model validation approach previously discussed. Since the use of O/R techniques might indeed not be optimizing the desired behavior of a given customer given the current business constraints, a longer term perspective needs to be considered when validating or measuring the success of these techniques. One measurement and validation approach might be the use of a hold out group where traditional target marketing techniques were used versus the group where O/R optimization techniques were used. Reports could be created that look at the overall customer over time and which might look at some key business indicators:
In this above report, we can see how the key business metrics of spend and attrition change overtime for these two groups. With this above template, incrementality across certain key business measures can then be determined. Keep in mind, though, that O/R will never deliver optimized results as long as there are certain business constraints. However, we can determine if the gap measuring incrementality is narrowing or growing between the control group and the O/R groups.
Let’s take a look at a more recent phenomenon which is the validation of net lift models. Once again, validation of these models is not the simple model evaluation process of determining the rank ordering effectiveness of these model (decile or gains charts) or the model’s ability to capture a certain % of the desired behavior relative at a given model cutoff (Gini coefficient or KS statistic). In net lift models, control or do not promote names need to be identified within the campaign while un-targeted names also need to be promoted in order to obtain a full read of the net lift results. Under this scenario, larger groups of undesirable names need to be promoted in order to obtain significant results that measure incremental performance between the marketed group and the non-marketed group. An example of a net lift model validation approach is listed below:
Clearly this above report would yield information on the performance of these net lift models. These reports could be set up to run at certain time intervals which would allow the practitioner to determine how results are changing overtime. Are the results consistent or unstable as we look at these net lift over a certain period of time ?
In this article, I have looked at a few examples of how validation is conducted in the world of CRM and database marketing. Predictive analytics, though, is being embraced beyond just the marketing world. Yet, are these sectors adopting the same validation and measurement rigor that database marketers have been employing for decades. This will be discussed more fully in the next article.
Richard Boire, B.Sc. (McGill), MBA (Concordia), is the founding partner at the Boire Filler Group, a nationally recognized expert in the database and data analytical industry and is among the top experts in this field in Canada, with unique expertise and background experience.
Mr. Boire’s mathematical and technical expertise is complimented by experience working at and with clients who work in the B2C and B2B environments. He previously worked at and with Clients such as: Reader’s Digest, American Express, Loyalty Group, and Petro-Canada among many to establish his top notch credentials.
After 12 years of progressive data mining and analytical experience, Mr. Boire established his own consulting company – Boire Direct Marketing in 1994. He writes numerous articles for industry publications, is a well-sought after speaker on data mining, and works closely with the Canadian Marketing Association on a number of areas including Education and the Database and Technology councils. He is currently the Chair of Predictive Analytics World Toronto.