Machine Learning Times
Machine Learning Times
AI Success Depends On How You Choose This One Number
 Originally published in Forbes, March 25, 2024. To do...
Elon Musk Predicts Artificial General Intelligence In 2 Years. Here’s Why That’s Hype
 Originally published in Forbes, April 10, 2024 When OpenAI’s...
Survey: Machine Learning Projects Still Routinely Fail to Deploy
 Originally published in KDnuggets. Eric Siegel highlights the chronic...
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...

10 years ago
The rise of pred med: How to keep predictive medicine ethical – and useful


Big Data is moving into America’s hospitals, whether we like it or not. Electronic health records are combining with clinical analytics that can synthesize huge amounts of data. Soon doctors will rely on predictive analytics to more accurately diagnose patients and allocate medical resources.

Predictive analytics, according to Glenn Cohen in an article he co-authored for Health Affairs in July, is “the use of electronic algorithms that forecast future events in real time.” The data that is put into these algorithms comes from disease registries, electronic health care records, claims data and patient surveys. The goal is to glean from everyone’s medical history solutions for individual patients.

Predictive health care analytics have undoubtedly benefited patients. But some doctors and journalists talk about it as the coming of the Reformation in medicine. “In many ways,” wrote physicians Eric Topol and Robert Cook for The Wall Street Journal, “the profession of medicine today is where Christianity was when the Gutenberg Bible put scriptures into the hands of the laity.” In other words, power, access and decision-making will be wrenched from the closed fist of the medical industrial complex and redistributed to the patients.

But whether we are truly on the verge of a revolutionary change in the way health care is administered is very much up for debate. Private companies are using patient information to pad their bottom lines, with few restrictions on what sorts of patient data they are allowed to use, raising serious ethical concerns questions of patient privacy and equity. Who ends up controlling this information and to what end will determine whether the benefit of the data goes to patients or to profits.

Possible advantages

Certainly, predictive analytics offers myriad possible boons to patients. Such models have most commonly been used in their early years of development to identify patients who are at high risk of hospital readmission. They’ve also been used to determine which patients are at risk of serious complications from treatments. Emory’s Winship Cancer Institute, for example, is teaming up with IBM, using its data-mining tools to improve cancer treatment by personalizing it for each patient, which the institute believes will improve health outcomes.

More tantalizing yet is the tremendous good these analytics could do for patients in the future. Take, for instance, the case of adults vying for limited beds in the intensive care unit. In this case, physicians could employ predictive technology to determine risk on an individual basis for thousands of patients. This database would be continuously updated to give the doctor the most accurate picture possible of who should have access to those beds.

None of this would be possible without private sector investment in predictive technology and interest in growing hulking businesses from the sophisticated cultivation of data points. But not all Big Data ventures between companies and hospitals put improving patient care above cutting costs. At best, the two processes are seen as symbiotic, and at worst, you get companies that leverage huge databases of medical records for advertising purposes.

Take, for example, MedSeek, a health care software company that helps hospitals determine which patients are most profitable and how to pitch their services to those patients so they keep coming back for more. They use data points such as race, socioeconomic status and age to shape their expert opinions. For example, The New York Times reported a few weeks back that MedSeek is helping one of its clients, Trinity Health System in Michigan, use these analytics to identify well-insured patients and encourage them to book more doctor visits and schedule more tests.

Big Data has the potential to allow insurers or medical providers to develop strategies to avoid high-cost patients.
Glenn Cohen
professor of law, Harvard University

MedSeek makes no bones about its belief that the role of patients is now transitioning from passive recipients of care to active consumers of hospital supply chains. Since the patient is now, in theory, no more or less important to hospitals than online shoppers are to Amazon shareholders and CEOs, MedSeek can boast to its prospective clients that it will “Convert more high-value prospects into loyal, paying patients.” Some observers might raise ethical concerns about hospitals’ using patients’ personal information to keep them caught in a duplicitously spun web of health care services. But MedSeek just calls it personalized care.

Under MedSeek’s cruel corporate logic, a small but significant class of patients is less likely to receive this personalized care. Approximately 5 percent of patients account for half of all U.S. health care spending. Many doctors and policymakers suggest that identifying these high-cost patients is one way that predictive analytics can make health care more cost effective. However, high-cost patients are those with one or several complex medical conditions, often paired with socioeconomic struggle and behavioral issues. And as Cohen remarked in a recent interview with Vox, “Big Data has the potential to allow insurers or medical providers to develop strategies to avoid high-cost patients” by spotting them not through illegal means like sorting by wealth and age but through sophisticated data mining,

There are no laws or policies to deal with data so intricately obtuse that it masks increasing socioeconomic disparities behind a veil of health statistics. Yet this is a future we need to prepare for now. Companies such as White Cloud Analytics, IBM care analytics, Verisk Health and dozens of other data-mining companies don’t use their analytics to help hospitals pitch their services to financially desirable patients. But they all advertise their ability to make hospitals more profitable, many by helping identify high-risk patients. These companies currently have no incentives aside from adding to their bottom lines.

Desirable patients

Questions of racial and socioeconomic equity are deeply entwined with the privacy consent problems that Big Data create. If predictive health care analytics increase socioeconomic disparities, they will be using everyone’s private information to do so.

And just as there is not yet a good way to regulate socioeconomic disparities caused by algorithmic health care, there are very few rules companies have to abide by when rifling through a population’s medical information.

Recent protections for patients’ privacy — most notably the 1996 Health Insurance Portability and Accountability Act (HIPAA) — have been used in direct opposition to patient rights. Hospitals have invoked HIPAA several times to deny patients access to their medical files, for example, when, in fact, the law guarantees the opposite. HIPAA also has rules for how data must be de-identified before it can be used. There are two ways of doing this. The first option is removing several identifiers, such as the patient’s name, email address and Social Security number, from the data set, which would impinge upon its predictive capabilities. The second is having an individual with appropriate expertise declare the risk of identification very small, but firmer language on what exactly this means when quality assurance is on the line has yet to be established.

Then there’s the question of whether predictive health care analytics functions as quality assurance at all. Certainly the collection and application of this data bears a likeness to quality assurance projects, in which data is used for improvement and management purposes and consent is not required. In contrast, medical studies conducted for the purpose of producing original research of course require consent. It’s difficult to say whether a computer that’s constantly coordinating data and spitting out solutions is doing research and running tests or if it’s only assessing the business of running a hospital (not to mention how useful this distinction is in application to predictive analytics in the first place).

Also at issue, as Cohen points out, patients must sign a “highly legalistic” form in order to comply with HIPAA and may not be aware of these new uses of data when they do so.

Furthermore, what companies do with patient medical information and how much of it they are obliged to anonymize is very much a legal and ethical gray zone at the moment.

What it all boils down to is this: We need to put limits not only on what information companies are able to use in a medical context but also on the purposes they use it for. We must make equity of care and patient quality of life the bottom line of American health care before we embrace Big Data as a panacea, else it may do more harm than good.

After all, as MedSeek’s website helpfully notes,  “Patients are people too.”

By: Hannah K. Gold, independent journalist
Originally published at

Leave a Reply