Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...
Today’s AI Won’t Radically Transform Society, But It’s Already Reshaping Business
 Originally published in Fast Company, Jan 5, 2024. Eric...
Calculating Customer Potential with Share of Wallet
 No question about it: We, as consumers have our...
A University Curriculum Supplement to Teach a Business Framework for ML Deployment
    In 2023, as a visiting analytics professor...
SHARE THIS:

3 years ago
Moving Beyond “Algorithmic Bias is a Data Problem”

 
Originally published in Patterns, April 9, 2021.
A surprisingly sticky belief is that a machine learning model merely reflects existing algorithmic bias in the dataset and does not itself contribute to harm. Why, despite clear evidence to the contrary, does the myth of the impartial model still hold allure for so many within our research community? Algorithms are not impartial, and some design choices are better than others. Recognizing how model design impacts harm opens up new mitigation techniques that are less burdensome than comprehensive data collection.

In the absence of intentional interventions, a trained machine learning model can and does amplify undesirable biases in the training data. A rich body of work to date has examined these forms of problematic algorithmic bias, finding disparities—relating to race, gender, geo-diversity, and more—in the performance of machine learning models.

However, a surprisingly prevalent belief is that a machine learning model merely reflects existing algorithmic bias in the dataset and does not itself contribute to harm. Here, we start out with a deceptively simple question: how does model design contribute to algorithmic bias?

A more nuanced understanding of what contributes to algorithmic bias matters because it also dictates where we spend effort mitigating harm. If algorithmic bias is merely a data problem, the often-touted solution is to de-bias the data pipeline. However, data “fixes” such as re-sampling or re-weighting the training distribution are costly and hinge on (1) knowing a priori what sensitive features are responsible for the undesirable bias and (2) having comprehensive labels for protected attributes and all proxy variables.

 

To continue reading this article, click here.

Leave a Reply