By: Eric Siegel, Ph.D., Predictive Analytics World
This article covers additional examples of the “accuracy fallacy” – it follows up as an addendum to my article, “Accuracy Fallacy: Orchestrating the Media’s Bogus Coverage of AI” (link forthcoming). I strongly suggest you read the other article first, since it introduces the concept and sets the groundwork.
Breaking News: Psychotic Breaks Are Still Mostly Unpredictable
“Machine learning algorithms can help psychologists predict, with 90% accuracy, the onset of psychosis by analyzing a patient’s conversations.” Thus opens an article in The Register (U.K.) eagerly covering an overzealous report out of Emory and Harvard Universities. Enshrined with the credibility of a publication in Nature, the researchers have the press believing their predictive model can confidently foretell who will develop psychosis and who won’t.
In this case, the researchers perpetrate a variation on the “accuracy fallacy” scheme: They report the classification accuracy you would get if half the cases were positive – that is, in a world in which 50% of the patients will eventually be diagnosed with psychosis. There’s a word for measuring accuracy in this way: cheating. Mathematically, this usually inflates the reported “accuracy” a bit less than the pairing test, but it’s fairly similar and it far overstates performance in much the same way.
The Emory/Harvard report oversells magnificently. It features a tantalizing “90% accuracy” in its opening summary while omitting two other key qualifications needed in order to set a meaningful context. First, what is the expected proportion that will develop psychosis? That is, how often is the predictive model expected to field positive cases in its intended deployment outside the lab? That fundamental is undisclosed. However, in trying to ascertain this, a persistent reader can chase citations and ultimately infer that the model is designed not for the general population, but rather only for patients who are help-seeking and therefore presumably at a somewhat higher risk. While the general population only exhibits a 3% rate of psychotic disorders, one of the samples included in this study (the training data) exhibited a 23% rate. If that’s the standard, it is still a good distance from the 50% over which they report performance. Second, their main result of 90% accuracy was established over a remarkably small number of cases: A sample of only 10 patients.
Unfortunately, mental illness is still tough to predict and, no, machine learning is not on its way to solving psychiatry, contrary to the belief held by some psychiatrists that AI will replace their job. This predictive model faces much the same limitations and trade-offs as the one that predicts sexual orientation. It will not be able to predict psychotic onsets without incurring many false positives. And, as before, “accuracy” isn’t even a pertinent benchmark for judging predictive performance.
Accuracy: A Word So Often Used Inaccurately
The list goes on and on, with many more examples of overblown claims about machine learning that perpetrate the “accuracy fallacy.”
Criminality. The Global Times (China) ran the headline, “Professor Claims AI Can Spot Criminals by Looking at Photos 90% of the Time.” Also reporting on this work, in which a model predicts criminality based on facial features, MIT Technology Review and The Telegraph (U.K.) each repeated the 90% accuracy claim. But the masses have been misled; throughout their original publication, the researchers use “accuracy” to actually mean AUC.
Death. One headline claimed, “Google AI Predicts Hospital Inpatient Death Risks with 95% Accuracy.” Google researchers published this result in Nature, leading the press astray by leaving it implied within the summary that AUC is a way to measure accuracy.
Suicide. The press reported on a model “that predicted suicide risk, using electronic health records, with 84 to 92% accuracy within one week of a suicide event.” The Vanderbilt University researchers pulled the same maneuver as Google, leaving it implied within the summary of their research publication that AUC is equivalent to accuracy.
Bestselling books. Beyond predicting the health and behavior of humans, machine learning predicts the sales of books. What if publishers could decide whether to green light each unpublished manuscript by knowing beforehand whether it would very likely go on to become a bestseller? Spoiler: They can’t. However, in the book, “The Bestseller Code: Anatomy of the Blockbuster Novel,” the authors claim they’ve “written an algorithm that can tell whether a manuscript will hit the New York Times bestseller list with 80% accuracy,” as The Guardian (U.K.) put it. The Wall Street Journal and The Independent (U.K.) also reported this level of accuracy. However, the authors conveniently established this accuracy level over a manufactured test set of books that were half bestsellers and half not bestsellers. Since in reality only one in 200 of the books included in this study were destined to become bestsellers, it turns out that a manuscript predicted by the model as a “future bestseller” actually has less than a 2% probability of becoming one.
And many more. The accuracy fallacy pervades, with researchers perpetrating it in the reports of spotting legal issues in non-disclosure agreements, IBM’s claim that they can predict which employees will quit with 95% accuracy, classifying which news headlines are “clickbait”, detecting fraudulent dating profile scams, spotting cyberbullies, predicting the need for first responders after an earthquake, detecting diseases in banana crops, distinguishing high and low-quality embryos for in vitro fertilization, predicting heart attacks, predicting heart issues by eye scan, detecting anxiety and depression in children, diagnosing brain tumors from medical images, detecting brain tumors with a new blood test, predicting the development of Alzheimer’s, and more.
The Accuracy Fallacy
This crafty misuse of the word “accuracy” cannot stand. Researchers dramatically misinform the public by erroneously using “accuracy” to mean AUC – or, similarly, by reporting accuracy over an artificially balanced test bed that’s half positive examples and half negative without spelling out the severe limits of that performance measure right up front. The responsibility falls first on the researcher to communicate unambiguously and unmisleadingly to journalists and second on the journalists to make sure they actually understand the predictive proficiency about which they’re reporting.
The accuracy fallacy plays an integral part of the harmful hyping of “AI” in general. By conveying unrealistic levels of performance, researchers exploit – and simultaneously feed into – the population’s fear of awesome, yet fictional, powers held by machine learning (commonly calling it artificial intelligence instead). Making matters worse, machine learning is further oversold because artificial intelligence is “over-souled” by proselytizers – they credit it with its own volition and humanlike intelligence (thanks to Eric King of “The Modeling Agency” for that pun).
About the Author
Eric Siegel, Ph.D., founder of the Predictive Analytics World and Deep Learning World conference series and executive editor of The Predictive Analytics Times, makes the how and why of predictive analytics (aka machine learning) understandable and captivating. He is the author of the award-winning book Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, the host of The Dr. Data Show web series, a former Columbia University professor, and a renowned speaker, educator, and leader in the field. Follow him at @predictanalytic.