Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Survey: Machine Learning Projects Still Routinely Fail to Deploy
 Originally published in KDnuggets. Eric Siegel highlights the chronic...
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...
Getting Machine Learning Projects from Idea to Execution
 Originally published in Harvard Business Review Machine learning might...
Eric Siegel on Bloomberg Businessweek
  Listen to Eric Siegel, former Columbia University Professor,...
SHARE THIS:

5 years ago
4 Human-Caused Biases We Need to Fix for Machine Learning

 

Originally published in The Next Web, October 27, 2018

Bias is an overloaded word. It has multiple meanings, from mathematics to sewing to machine learning, and as a result it’s easily misinterpreted.

When people say an AI model is biased, they usually mean that the model is performing badly. But ironically, poor model performance is often caused by various kinds of actual bias in the data or algorithm.

Machine learning algorithms do precisely what they are taught to do and are only as good as their mathematical construction and the data they are trained on. Algorithms that are biased will end up doing things that reflect that bias.

To the extent that we humans build algorithms and train them, human-sourced bias will inevitably creep into AI models. Fortunately, bias, in every sense of the word as it relates to machine learning, is well understood. It can be detected and it can be mitigated — but we need to be on our toes.

There are four distinct types of machine learning bias that we need to be aware of and guard against.

1. Sample Bias

Sample bias is a problem with training data. It occurs when the data used to train your model does not accurately represent the environment that the model will operate in. There is virtually no situation where an algorithm can be trained on the entire universe of data it could interact with.

But there’s a science to choosing a subset of that universe that is both large enough and representative enough to mitigate sample bias. This science is well understood by social scientists, but not all data scientists are trained in sampling techniques.

We can use an obvious but illustrative example involving autonomous vehicles. If your goal is to train an algorithm to autonomously operate cars during the day and night, but train it only on daytime data, you’ve introduced sample bias into your model. Training the algorithm on both daytime and nighttime data would eliminate this source of sample bias.

Continue reading this article here.

About the Author

Glen Ford is Director of Product Management at Alegion with over 20 years of experience building web applications for Fortune 500 companies as a solutions engineer, web app engineer and product manager. Alegion combines a software platform and a global pool of data specialists to offload from enterprise data science team’s complex and large-scale tasks like training data preparation, model testing and post-production exception handling.

Leave a Reply