Do Predictive Modelers Need to Know Math?

Predictive analytics is just a bunch of math, isn’t it? After all, algorithms in the form of matrix algebra, summations, integrals, multiplies and adds are the core of what predictive modeling algorithms do. Even rule-based approaches need math to compute how good the if-then-else rules are.

I was participating in a predictive analytics course recently and the question a participant asked at the end of two days of instruction was this: “it’s been a long time since I’ve had to do this kind of math and I’m a bit rusty. Is there a book that would help me learn the techniques without the math?”

The question about math was interesting. But do we need to know the math to build models well? Anyone can build a bad model, but to build a good model, don’t we need to know what the algorithms are doing? The answer, of course, depends on the role of the analyst. I contend, however, that for most predictive analytics projects, the answer is “no”.

Let’s consider building decision tree models. What options does one need to set to build good trees? Here is a short list of common knobs that can be set by most predictive analytics software packages:

Splitting metric (CART style trees, C5 style trees, CHAID style trees, etc.)
Terminal node minimum size
Parent node minimum size
Maximum tree depth
Pruning options (standard error, Chi-square test p-value threshold, etc.)

The most mathematical of these knobs is the splitting metric. CART-styled trees use the Gini Index, C5 trees use Entropy (information gain), and CHAID style trees use the chi-square test as the splitting criterion. A book I consider the best technical book on data mining and statistical learning methods, “The Elements of Statistical Learning”, has this description of the splitting criteria for decision trees, including the Gini Index and Entropy:

To a mathematician, these make sense. But without a mathematics background, these equations will be at best opaque and at worst incomprehensible. (And these are not very complicated. Technical textbooks and papers describing machine learning algorithms can be quite difficult even for more seasoned, but out-of-practice mathematicians to understand).

As someone with a mathematics background and a predictive modeler, I must say that the actual splitting equations almost never matter to me. Gini and Entropy often produce the same splits or at least similar splits. CHAID differs more, especially in how it creates multi-way splits.

There are, however, very important reasons for someone on the team to understand the mathematics or at least the way these algorithms work qualitatively. First and foremost, understanding the algorithms helps us uncover why models go wrong. Models can be biased toward splitting on particular variables or even particular records. In some cases, it may appear that the models are performing well but in actuality they are brittle. Understanding the math can help remind us that this may happen and why.

The fact that linear regression uses a quadratic cost function tells us that outliers affect overall error disproportionately. Understanding how decision trees measure differences between the parent population and sub-populations informs us why a high-cardinality variable may be showing up at the top of our tree, and why additional penalties may be in order to reduce this bias.

The answer to the question if predictive modelers need to know math is this: no they don’t need to understand the mathematical notation, but neither should they ignore the mathematics. Instead, we all need to understand the effects of the mathematics on the algorithms we use. “Those who ignore statistics are condemned to reinvent it,” warns Bradley Efron of Stanford University. The same applies to mathematics.

Dean Abbott is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, risk modeling, text mining, response modeling, survey analysis, planned giving, and predictive toxicology. In addition, Mr. Abbott serves as chief technology officer and mentor for start-up companies focused on applying advanced analytics in their consulting practices.

Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including PAW, KDD, AAAI, IEEE and several data mining software users conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught applied data mining courses for major software vendors, including SPSS-IBM Modeler (formerly Clementine), Unica PredictiveInsight (formerly Affinium Model), Enterprise Miner (SAS), Model 1 (Group1 Software), and hands-on courses using Statistica (Statsoft), Tibco Spotfire Miner (formerly Insightful Miner), and CART (Salford Systems).

11 thoughts on “Do Predictive Modelers Need to Know Math?”

vignesh m on December 4, 2014 at 5:41 am said:

Hi,i have like your news..Very nice…Thanks for that…hadoop training chennai
hope suresh on November 24, 2017 at 5:38 am said:

Hi thank for the information……………….
Chaitu Viswa on May 20, 2023 at 5:35 am said:

Thanks man….Definitely given me a more understanding of how this can be possible, as long as we put in the effort…..

Certified Scrum Product Owner Online Training from India
Ram Kashyap on September 12, 2023 at 3:13 am said:

This is really a nice article as it helped me learn a lot and definitely it will help others. Thanks for sharing this useful content with the readers. If you get a chance then have a look at it- uipath certification.

Thanks you.
Steve Diaz on October 5, 2023 at 6:33 am said:

Your writing has a way of making complex topics feel approachable and easy to understand. It’s clear that you put a lot of effort into this, and it’s greatly appreciated. Looking forward to more of your fantastic content!
Golang Certification
kevin vasquez on April 11, 2024 at 1:57 am said:

Thanks for giving this wonderful information here. I would like to get more info. from here. dg customer first winners
Solar Cat on May 24, 2024 at 9:00 am said:

Thanks for providing valuable information in this blog post. Solar
Jane John on September 10, 2024 at 10:28 pm said:

Playing a typical papa’s games means making all the delectable fast food and bakery goods in the allotted time. The object of these games is to match the correct ingredients with the correct sequence. We prioritize prompt and accurate service!
Bakner Stive on December 14, 2024 at 7:07 am said:

The fact that the Dollar General Corporation is the general store of choice for communities all around the country serves as a source of national pride for the company.https://modernbabyname.com/
James Allen on March 18, 2025 at 2:03 am said:

This information is amazing and informative. Official Website
meniv meniv on May 6, 2025 at 2:31 am said:

This is a great perspective on the importance of understanding the underlying mathematics behind predictive models, especially decision trees. As you’ve pointed out, the choice of splitting criterion (Gini, Entropy, Chi-square, etc.) is not always crucial in practice since many times, different algorithms will produce similar splits. However, understanding the qualitative differences between them—such as how CHAID handles multi-way splits or how Gini and Entropy function in binary splits—can be important when you’re troubleshooting a model or when you need to explain its behavior. cat translation app

EXCLUSIVE HIGHLIGHTS

Related

Do Predictive Modelers Need to Know Math?

11 thoughts on “Do Predictive Modelers Need to Know Math?”

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

Do Predictive Modelers Need to Know Math?

Recommended

Predictive Analytics & Retail

What do we see in Predictive Models?

11 thoughts on “Do Predictive Modelers Need to Know Math?”

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact