Data miners employ a variety of techniques to develop robust predictive models. Often, our analysts are confronted with a dilemma. Should we construct one model to address the business objective? Or perhaps, multiple models may be in order? Take, for example, a marketer that has a presence on the east coast and in the mid-west. Will one analysis be sufficient, or conceivably splitting up the universe by geography, and then formulating separate models on each population would provide enhanced results. After all, behaviors in different parts of the country may very well be different. When is it appropriate to construct multiple models to attack a problem? And when does it make sense to use the single model approach?
Often, we target the whole population, and construct a predictive model that is developed on this entire universe-one sample. This, I refer to as the “Conquer” approach. No attempt is made to find smaller groups that comprise the entire population, and then develop models on each group. At the conclusion, these models are bought together to score and evaluate the entire population.
Take for example, a common application of attrition. A good argument can be made that there exist individuals who find the quality of a service to be poor. Hence, they leave the franchise. On the other hand, there may very well exist customers who are cost conscious, and can no longer afford the service fees. Hence, they attrite. It seems plausible that predictors for the quality segment would be different than the predictors emerging from an analysis of the cost conscious group.
Or take the service provider that conducts a campaign to attract new customers. There may very well be those responding who are in urgent need of the service. Yet, there may be other responders, who have no current need for the service, but are attracted by the modest service charge that accompanies registering for the product.
Granted, that to identify the reason for the attrition or the reason for the response might be challenging, but if it were possible, then developing two separate models would be intuitively appealing.
If we can determine that potential predictors from each group are different, that may provide a clue as to the need for developing separate models for each of the segments. We’ll use this rule in a few moments.
Generally, groups may be derived through intuition, a TREE analysis, or a clustering approach. I will not address these more formal clustering algorithms. I hope to do so in a future piece.
Potentially, separate models can be developed for each segment, and then they can be unified before customer scoring. I refer to this as “Divide and Conquer.”
Let’s take a retailer who is developing a model to locate likely responders to a recent direct mail campaign. Data is available that includes past spending in different departments, as well as some additional generic data.
Let’s develop a predictive model (based on logistic regression) that includes the entire universe. The final parameters looked like this.
First a short description. A gains table is a performance report of model results. Each observation is placed into one of ten equal groups, with the observations in the first group having the highest probability of response. Cumulative numbers of observations and numbers of responders are also included in the chart.
Producing a GAINS table provides the following:
While there are many nuances in the above chart, I’ll focus on two:
Now the analyst must decide: Am I finished, or can I further improve results?
One tactic to potentially enhance our GAINS table, involves the Divide and Conquer approach. What’s involved?
Keep in mind that other modeling tools in addition to the logistic regression referred to above can and should be used.
Let’s start with the TREE. It looks like this:
Four resulting segments emerged-Nodes 3,4,5, and 6 [bottom 4 boxes]. The highest response rate appeared at node 6. This segment can be described as “Those spending in Department 24, and those spending in Department 17”.
After constructing the TREE analysis, and classifying each record into the appropriate node, (branch) , we proceed with a ‘feature selection’ procedure. While there are many methods to select ‘best’ potential predictors, the feature selection I frequently choose calculates the F-value, and associated p-value.
If I look at each of the four nodes separately, the critical question I ask is “are the features (predictors) the same in all the nodes?” Below are the ‘best predictors’ using the feature selection approach mentioned above. Identical colors point to the same variable. So in Node 3, and Node 4, we notice, for example, the blue cell CNT_MONTH (the number of months during the year in which a purchase occurred) appears in both sections. We need to determine whether there is adequate variety of predictors in the four segments.
By glancing over the four tables, it appears, that there may indeed be an adequate variety of features to warrant separate models for each branch.
Four separate models were constructed. Gains table appears below.
Let’s do a quick comparison of the two statistics I alluded to above.
As our retailer will be contacting 50% of his base, we need to observe the cumulative response rates at the 5^{th} decile. This can be secured from inspecting the gains tables. We notice the following.
Response rates are significantly different at the 95% level.
The number of mailed names will be 6,000,000. Let’s do a quick analysis to determine the value of the multiple models to our marketer. The table below highlights, that by using the multi model approach, 8,400 additional responders would be realized. At the expected revenue of $195/responder, we arrive at an additional $1,638,000 in revenue!
It seems our hunch was correct. Enhanced results are available by using the multiple model approach. Of course, this added benefit has to be weighed against the incremental cost of developing the supplementary analyses. In this case, it was well worth it.
The multiple model approach can often offer potential value. And there is a simple way of gauging whether additional efforts should be employed to develop these algorithms. By using a feature selection technique, the analyst can get a reasonable feeling as to the utility of additional model development.
ABOUT THE AUTHOR
Sam Koslowsky, a pioneer in the predictive analytic arena, has some thirty years’ experience in providing quantitative solutions for a diverse group of Fortune 500 firms. Currently, he is Senior Analytic Consultant at Harte Hanks. His assignments include formulating predictive models and developing advanced segmentation schemes
Sam spent a decade with American Express, the last four as Vice President of Database Marketing, Modeling, and Analysis. His responsibilities included providing quantitative solutions to business problems, as well as providing strategic planning to a varied mixture of businesses under the American Express umbrella.
Sam is a frequent speaker at industry meetings and publishes regularly in marketing related journals. His most recent article on the Use of Genetic Algorithms appeared in the latest edition of The Direct Marketing Research Journal.
He is an active member of the Direct Marketing Association, American Marketing Association, American Statistical Association, and INFORMS, an organization that disseminates and encourages application of quantitative techniques to address strategic and marketing issues. Sam has also lectured at both New York and Columbia Universities.
Sam has an MBA in finance from New York University, where he was also granted an Advanced Professional Certificate for postgraduate study in advanced quantitative methods. He has an undergraduate degree in mathematics from Yeshiva University, New York. Sam can be reached at sam.koslowsky@hartehanks.com
Predictive Analytics Times © 2018 • 211 E. Victoria Street, Suite E •
Santa Barbara, CA 93101
Produced by: Rising Media & Prediction Impact