In anticipation of his upcoming conference co- presentation, Advanced Analytics and the Corporate Audit Function at Predictive Analytics World San Francisco, April 3-7, 2016, we asked Matthew Pietrzykowski, Senior Data Scientist at General Electric, a few questions about his work in predictive analytics.
Q: In your work with predictive analytics, what behavior or outcome do your models predict?
A: The predictive models that tend to be generated in GE’s Corporate Audit Function are heavily focused on classification outcomes, forecasting and optimization. The types of models used range from logistic regression to random forest classification models. Typically, the models are built to help auditors assess whether there is evidence to support an auditable event or find the optimal or reasonable outcome. These models tend to be of mixed data types and some are augmented with the results of text mining short form narrative fields.
Q: How does predictive analytics deliver value at your organization – what is one specific way in which it actively drives decisions or operations?
A: In the corporate audit space, predictive analytics has seen sporadic use since most of the internal audit work is retrospective with a focus on uncovering mechanisms of potential failure rather than the prediction of new cases. However we have found great potential of it with reducing false positives through targeted reviews of audit field work as well as in risk assessment. As an example, predictive analytics is being used in executive level planning to help with auditor deployment. The model predicts business sites with a greater risk of showing red flags.
However, it is important to note that predictive analytics is not just essential in the corporate audit function of GE. It is also how GE, the world’s leading Digital Industrial Company, is transforming industry with software-defined machines and solutions that are connective, responsive and predictive.
Q: Can you describe a quantitative result, such as the predictive lift of your model or the ROI of an analytics initiative?
A: One of the models produced predicts the risk of an auditable outcome by classifying business sites using multiple disparate data sets. The goal was to compile different resources as potential inputs that are used in a typical audit analysis. These data were from different sources with different schema, so the blending problem was of particular concern. The final model predicted with a ~90% classification accuracy on test data which is a ~23% improvement over base rate.
Q: What surprising discovery or insight have you unearthed in your data?
A: Some of the more surprising outcomes and insights came from necessity. Most of our data is mixed data types with continuous, categorical, and short form text fields. Text mining the narrative fields have resulted in both insightful and more impactful overall modeling results than if the narrative fields were omitted. As an example, we helped one of the businesses leverage their short-form narrative fields by mining them, summarizing them into semantic clouds, and aggregating the results into a summary measure over time. This time series can then be analyzed for trends that are potential markers for risk events. We are even seeing evidence to suggest that document term matrices can be used as differentiable attribute data in classification models.
Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World.
A: Data science and in particular, predictive analytics, has a place in the corporate audit function. In fact, it’s a strategic part of how GE audits it’s various businesses in dispersed geographic locations. Advanced analytics is a core requirement for our auditors so that they can leverage it in a scientific manner while they are actively auditing our business sites. The value is seen not only in risk abatement, planning, and forecasting, but it’s forcing a paradigm shift in the organization.
Don’t miss Matthew’s conference co-presentation, Advanced Analytics and the Corporate Audit Function on Monday, April 4, 2016 at 11:20 am to 12:05 pm at Predictive Analytics World San Francisco. Click here to register to attend. USE CODE PATIMES16 for 15% off current prices (excludes workshops).
By: Eric Siegel, Founder, Predictive Analytics World
Eric Siegel is the founder of Predictive Analytics World (www.pawcon.com) — the leading cross-vendor conference series consisting of 10 annual events in New York, Chicago, San Francisco, Washington DC, London, and Berlin — and the author of the award-winning book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die – Revised and Updated Edition, (Wiley, 2016).