The process used to build most of the machine-learning models we use today can’t tell if they will work in the real world or not—and that’s a problem.
It’s no secret that machine-learning models tuned and tweaked to near-perfect performance in the lab often fail in real settings. This is typically put down to a mismatch between the data the AI was trained and tested on and the data it encounters in the world, a problem known as data shift. For example, an AI trained to spot signs of disease in high-quality medical images will struggle with blurry or cropped images captured by a cheap camera in a busy clinic.
Now a group of 40 researchers across seven different teams at Google have identified another major cause for the common failure of machine-learning models. Called “underspecification,” it could be an even bigger problem than data shift. “We are asking more of machine-learning models than we are able to guarantee with our current approach,” says Alex D’Amour, who led the study.
D’Amour’s initial investigation snowballed and dozens of Google researchers ended up looking at a range of different AI applications, from image recognition to natural language processing (NLP) to disease prediction. They found that underspecification was to blame for poor performance in all of them. The problem lies in the way that machine-learning models are trained and tested, and there’s no easy fix.
To continue to read this article, click here.