MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs

Jul 26, 2020
Comments Off on MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs
Industry News, Left-hand
2382 Views

How to Build a Recommendation System at Scale: Insights from Instacart
Government by AI? Trump Administration Plans to Write Regulations Using Artificial Intelligence
From Text To Tables: Why Structured Data Is AI’s Next $600 Billion Frontier

6 years ago
MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs

By: Katyanna Quach, The Register

Originally published in The Register, July 1, 2020

Top uni takes action after El Reg highlights concerns by academics.

Special report MIT has taken offline its highly cited dataset that trained AI systems to potentially describe people using racist, misogynistic, and other problematic terms.

The database was removed this week after The Register alerted the American super-college. MIT also urged researchers and developers to stop using the training library, and to delete any copies. “We sincerely apologize,” a professor told us.

The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT’s cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.

Applications, websites, and other products relying on neural networks trained using MIT’s dataset may therefore end up using these terms when analyzing photographs and camera footage.

The problematic training library in question is 80 Million Tiny Images, which was created in 2008 to help produce advanced object-detection techniques. It is, essentially, a huge collection of photos with labels describing what’s in the pics, all of which can be fed into neural networks to teach them to associate patterns in photos with the descriptive labels. So when a trained neural network is shown a bike, it can accurately predict a bike is present in the snap. It’s called Tiny Images because the pictures in library are small enough for computer-vision algorithms in the late-2000s and early-2010s to digest.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

6 years ago
MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs

Originally published in The Register, July 1, 2020

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

6 years agoMIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs

Originally published in The Register, July 1, 2020

Recommended

How to Build a Recommendation System at Scale: Insights from Instacart

Government by AI? Trump Administration Plans to Write Regulations Using Artificial Intelligence

From Text To Tables: Why Structured Data Is AI’s Next $600 Billion Frontier

Is A.I. Actually a Bubble?

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

6 years ago
MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact