By: Geert Verstraeten, Predictive Analytics advocate, Managing Partner and Professional Trainer, Python Predictions
In the previous five years, the analytical scene has been dominated by the prediction that we would soon experience an important shortage of analytical talent. As a result, academic programs and massive open online courses (MOOCs) have sprung up like mushrooms after the rain, all with the purpose of developing skills for the analyst or its more modern counterpart, the data scientist. More recently, organizations are becoming increasingly aware of the fact that the main lever for success with predictive analytics lies in developing managerial skills in this domain (see our previous PATimes article). And there lies our greatest challenge today – how to offer relevant tools allowing managers to be impactful in an engaging way, without drowning our ‘sweet victims’ in technology and jargon.
For managers, most analytics training falls short in a critical way. The vast majority of newfound analytics training focuses on core analytics methods and model building, not on the organizational process needed to apply it. In my personal opinion, the single most important tool for any manager lies in understanding the process of what should be managed. The absolute essence when asked to supervise predictive analytical developments lies in having a solid understanding of the main project phases required (for the sake of clarity: a predictive model is a representation of the way we understand a phenomenon – or if you will, a formulaic way to combine predictive information in a way to optimally predict future behavior or events). Obviously, we are not the first to realize that this is vital. Tools have been developed to describe the process methodology for developing predictive models. However, it is difficult for non-experts to become excited about these tools, as they describe phases in a rather dry way.
Considering our history – the fact that in our organization we have only spent our time on developing and managing development of predictive models in the past nine years – we have experimented with different ways to present process methodology in a more fun and engaging way. We no longer experiment. In our meetings with prospects, trainings for managers, business presentations, and all kinds of meetings with different stakeholders (such as sales executives who need a minimum of trust of what has been provided), we present the development of analytical models as simple as the process of making soup in a soup bar.
This first phase is concerned with understanding the organization’s needs, priorities, desires and resources. Taking the order basically means we should start by carefully exploring what it is that we need to predict. Do we want to predict who will leave our organization in the next year, and if so – how will we define this concretely? At this time, when the order becomes clear, it is time to check the stock to make sure we will be able to cook the desired dish. This is no different than checking data availability. Additionally, it is important to have an idea about timing: will our client need to leave timely in order to catch the latest movie? This is pretty similar to drawing a project plan.
The second phase deals with preparing all useful data in a way that they are ready to be used subsequently in the analysis. For those not familiar with (French) cooking jargon, mise en place is a term used in professional kitchens to refer to organizing and arranging the ingredients (e.g., cuts of meat, relishes, sauces, par-cooked items, spices, freshly chopped vegetables, and other components) that a cook will require for the menu items that are expected to be prepared during a shift (see Wikipedia). Data are for predictive analytics what ingredients are for making soup. In predictive analytics, data are gathered, cleaned and often sliced and diced in a way that they are ready to use in a later analytical stage.
The main task in cooking the soup lies in choosing exactly those ingredients that blend into a great result. This is no different in predictive modeling, where the absolute essence lies in selecting those variables that are jointly capable of predicting the event of interest. One does not make a great soup with only onions. Obviously, not only the presence of ingredients is relevant, also the proportions in which they are used – compare this to the parameters of predictors: not every predictor is equally important for obtaining a high quality result. Finally, cooking techniques matter just as much as algorithms do in predictive analytics – they represent essentially different ways to combine the same data into the best soup.
In cooking it is crucial to taste a dish before it is served. This is very similar to model validation in predictive model building. Both technical and business relevant measures can be used to objectively determine whether a model built on a specific data set will hold true for new data. As long as the soup does not taste well, we can iterate back to cooking, until the final soup is approved – i.e. the champion model is selected.
This phase is all about how to arrange everything on the plate and how to professionally serve the dish. Or: how to present and profile the results in such a way that they provide maximum value to the audience. It is possible to create a great soup, and present it in such an awful way that it will not be eaten. The same holds true for predictive models – it is possible to have a model with fantastic performance but fail to convince potential users because key insights are missing. But by presenting the soup professionally, we increase the odds that the soup will be eaten. When eaten, we can consistently check in that everything is fine. In predictive modeling, this often lies in designing a controlled field experiment, for example designing a set of retention campaigns targeting those with the highest potential to leave.
– this simple, intuitive process has been important to us to allow managers to engage in the process in a fun way. Presenting the process in a non-technical way makes the process digestible (to be fair, I’ve stolen this phrase from my friend Andrew Pease, Global Practice Analytics Lead at SAS because it makes such great sense in this context). However, it should remain clear that it is only a metaphor. At some point, building predictive models is obviously also different that making soup. Every phase, especially project definition, involves many more components than those where a link with soup can be found. But the metaphor gets us where we want to be – a point where a discussion is possible on what is needed to develop predictive models, and where a minimum of trust can exist: it ensures we get on speaking terms with decision makers and all those who will be impacted by the models developed.
Notes and further reading – We fully realize this is not completely different from CRISP-DM, the Cross Industry Standard Process for Data Mining, which has been developed in 1996, and is still the leading process methodology used by 43% of analysts and data scientists. However, except if you are a veteran and/or an analyst, it is difficult to get really excited about CRISP-DM or its typical visualization. For those looking for a more in-depth understanding of the process, I recommend reading the modern answer to CRISP-DM, the Standard Methodology for Analytical Models (SMAM).
Geert Verstraeten is Managing Partner at Python Predictions, a niche player in the domain of Predictive Analytics. He has over 10 years of hands-on experience in Predictive Analytics and in training predictive analysts. His main interest lies in enabling clients to take their adoption of analytics to the next level. Details on his next training for managers (Brussels, October 1st) can be found.