Originally published in The Next Web, October 27, 2018
Bias is an overloaded word. It has multiple meanings, from mathematics to sewing to machine learning, and as a result it’s easily misinterpreted.
When people say an AI model is biased, they usually mean that the model is performing badly. But ironically, poor model performance is often caused by various kinds of actual bias in the data or algorithm.
Machine learning algorithms do precisely what they are taught to do and are only as good as their mathematical construction and the data they are trained on. Algorithms that are biased will end up doing things that reflect that bias.
To the extent that we humans build algorithms and train them, human-sourced bias will inevitably creep into AI models. Fortunately, bias, in every sense of the word as it relates to machine learning, is well understood. It can be detected and it can be mitigated — but we need to be on our toes.
There are four distinct types of machine learning bias that we need to be aware of and guard against.
1. Sample Bias
Sample bias is a problem with training data. It occurs when the data used to train your model does not accurately represent the environment that the model will operate in. There is virtually no situation where an algorithm can be trained on the entire universe of data it could interact with.
But there’s a science to choosing a subset of that universe that is both large enough and representative enough to mitigate sample bias. This science is well understood by social scientists, but not all data scientists are trained in sampling techniques.
We can use an obvious but illustrative example involving autonomous vehicles. If your goal is to train an algorithm to autonomously operate cars during the day and night, but train it only on daytime data, you’ve introduced sample bias into your model. Training the algorithm on both daytime and nighttime data would eliminate this source of sample bias.
About the Author
Glen Ford is Director of Product Management at Alegion with over 20 years of experience building web applications for Fortune 500 companies as a solutions engineer, web app engineer and product manager. Alegion combines a software platform and a global pool of data specialists to offload from enterprise data science team’s complex and large-scale tasks like training data preparation, model testing and post-production exception handling.