Machine Learning Times
EXCLUSIVE HIGHLIGHTS
2 More Ways To Hybridize Predictive AI And Generative AI
  Originally published in Forbes Predictive AI and generative AI...
How To Overcome Predictive AI’s Everyday Failure
  Originally published in Forbes Executives know the importance of predictive...
Our Last Hope Before The AI Bubble Detonates: Taming LLMs
  Originally published in Forbes To know that we’re in...
The Agentic AI Hype Cycle Is Out Of Control — Yet Widely Normalized
  Originally published in Forbes I recently wrote about how...
SHARE THIS:

7 years ago
4 Human-Caused Biases We Need to Fix for Machine Learning

 

Originally published in The Next Web, October 27, 2018

Bias is an overloaded word. It has multiple meanings, from mathematics to sewing to machine learning, and as a result it’s easily misinterpreted.

When people say an AI model is biased, they usually mean that the model is performing badly. But ironically, poor model performance is often caused by various kinds of actual bias in the data or algorithm.

Machine learning algorithms do precisely what they are taught to do and are only as good as their mathematical construction and the data they are trained on. Algorithms that are biased will end up doing things that reflect that bias.

To the extent that we humans build algorithms and train them, human-sourced bias will inevitably creep into AI models. Fortunately, bias, in every sense of the word as it relates to machine learning, is well understood. It can be detected and it can be mitigated — but we need to be on our toes.

There are four distinct types of machine learning bias that we need to be aware of and guard against.

1. Sample Bias

Sample bias is a problem with training data. It occurs when the data used to train your model does not accurately represent the environment that the model will operate in. There is virtually no situation where an algorithm can be trained on the entire universe of data it could interact with.

But there’s a science to choosing a subset of that universe that is both large enough and representative enough to mitigate sample bias. This science is well understood by social scientists, but not all data scientists are trained in sampling techniques.

We can use an obvious but illustrative example involving autonomous vehicles. If your goal is to train an algorithm to autonomously operate cars during the day and night, but train it only on daytime data, you’ve introduced sample bias into your model. Training the algorithm on both daytime and nighttime data would eliminate this source of sample bias.

Continue reading this article here.

About the Author

Glen Ford is Director of Product Management at Alegion with over 20 years of experience building web applications for Fortune 500 companies as a solutions engineer, web app engineer and product manager. Alegion combines a software platform and a global pool of data specialists to offload from enterprise data science team’s complex and large-scale tasks like training data preparation, model testing and post-production exception handling.

Comments are closed.