How cognitive biases impact healthcare decisions

https://www.linkedin.com/pulse/how-cognitive-biases-impact-healthcare-decisions-robert-pearl-m-d–ti5qc/?trackingId=eQnZ0um3TKSzV0NYFyrKXw%3D%3D

Day one of the healthcare strategy course I teach in the Stanford Graduate School of Business begins with this question: “Who here receives excellent medical care?”

Most of the students raise their hands confidently. I look around the room at some of the most brilliant young minds in business, finance and investing—all of them accustomed to making quick yet informed decisions. They can calculate billion-dollar deals to the second decimal point in their heads. They pride themselves on being data driven and discerning.

Then I ask, “How do you know you receive excellent care?”

The hands slowly come down and room falls silent. In that moment, it’s clear these future business leaders have reached a conclusion without a shred of reliable data or evidence.

Not one of them knows how often their doctors make diagnostic or technical errors. They can’t say whether their health system’s rate of infection or medical error is high, average or low.

What’s happening is that they’re conflating service with clinical quality. They assume a doctor’s bedside manner correlates with excellent outcomes.

These often false assumptions are part of a multi-millennia-long relationship wherein patients are reluctant to ask doctors uncomfortable but important questions: “How many times have you performed this procedure over the past year and how many patients experienced complications?” “What’s the worst outcome a patient of yours had during and after surgery?”

The answers are objective predictors of clinical excellence. Without them, patients are likely to become a victim of the halo effect—a cognitive bias where positive traits in one area (like friendliness) are assumed to carry over to another (medical expertise).

This is just one example of the many subconscious biases that distort our perceptions and decision-making.

From the waiting room to the operating table, these biases impact both patients and healthcare professionals with negative consequences. Acknowledging these biases isn’t just an academic exercise. It’s a crucial step toward improving healthcare outcomes.

Here are four more cognitive errors that cause harm in healthcare today, along with my thoughts on what can be done to mitigate their effects:

Availability bias

You’ve probably heard of the “hot hand” in Vegas—a lucky streak at the craps table that draws big cheers from onlookers. But luck is an illusion, a product of our natural tendency to see patterns where none exist. Nothing about the dice changes based on the last throw or the individual shaking them.

This mental error, first described as “availability bias” by psychologists Amos Tversky and Daniel Kahneman, was part of groundbreaking research in the 1970s and ‘80s in the field of behavioral economics and cognitive psychology. The duo challenged the prevailing assumption that humans make rational choices.

Availability bias, despite being identified nearly 50 years ago, still plagues human decision making today, even in what should be the most scientific of places: the doctor’s office.

Physicians frequently recommend a treatment plan based on the last patient they saw, rather than considering the overall probability that it will work. If a medication has a 10% complication rate, it means that 1 in 10 people will experience an adverse event. Yet, if a doctor’s most recent patient had a negative reaction, the physician is less likely to prescribe that medication to the next patient, even when it is the best option, statistically.

Confirmation bias

Have you ever had a “gut feeling” and stuck with it, even when confronted with evidence it was wrong? That’s confirmation bias. It skews our perceptions and interpretations, leading us to embrace information that aligns with our initial beliefs—and causing us to discount all indications to the contrary.

This tendency is heightened in a medical system where physicians face intense time pressures. Studies indicate that doctors, on average, interrupt patients within the first 11 seconds of being asked “What brings you here today?” With scant information to go on, doctors quickly form a hypothesis, using additional questions, diagnostic testing and medical-record information to support their first impression.

Doctors are well trained, and their assumptions prove more accurate than incorrect overall. Nevertheless, hasty decisions can be dangerous. Each year in the United States, an estimated 371,000 patients die from misdiagnoses.

Patients aren’t immune to confirmation bias, either. People with a serious medical problem commonly seek a benign explanation and find evidence to justify it. When this happens, heart attacks are dismissed as indigestion, leading to delays in diagnosis and treatment.

Framing effect

In 1981, Tversky and Kahneman asked subjects to help the nation prepare for a hypothetical viral outbreak. They explained that if the disease was left untreated, it would kill 600 people. Participants in one group were told that an available treatment, although risky, would save 200 lives. The other group was told that, despite the treatment, 400 people would die. Although both descriptions lead to the same outcome—200 people surviving and 400 dying—the first group favored the treatment, whereas the second group largely opposed it.

The study illustrates how differently people can react to identical scenarios based on how the information is framed. Researchers have discovered that the human mind magnifies and experiences loss far more powerfully than positive gains. So, patients will consent to a chemotherapy regiment that has a 20% chance of cure but decline the same treatment when told it has 80% likelihood of failure.

Self-serving bias

The best parts about being a doctor are saving and improving lives. But there are other perks, as well.

Pharmaceutical and medical-device companies aggressively reward physicians who prescribe and recommend their products. Whether it’s a sponsored dinner at a Michelin restaurant or even a pizza delivered to the office staff, the intention of the reward is always the same: to sway the decisions of doctors.

And yet, physicians swear that no meal or gift will influence their prescribing habits. And they believe it because of “self-serving bias.”

In the end, it’s patients who pay the price. Rather than receiving a generic prescription for a fraction of the cost, patients end up paying more for a brand-name drug because their doctor—at a subconscious level—doesn’t want to lose out on the perks.

Thanks to the “Sunshine Act,” patients can check sites like ProPublica’s Dollars for Docs to find out whether their healthcare professional is receiving drug- or device-company money (and how much).

Reducing subconscious bias

These cognitive biases may not be the reason U.S. life expectancy has stagnated for the past 20 years, but they stand in the way of positive change. And they contribute to the medical errors that harm patients.

A study published this month in JAMA Internal Medicine found that 1 in 4 hospital patients who either died or were transferred to the ICU had been affected by a diagnostic mistake. Knowing this, you might think cognitive biases would be a leading subject at annual medical conferences and a topic of grave concern among healthcare professionals. You’d be wrong. Inside the culture of medicine, these failures are commonly ignored.

The recent story of an economics professor offers one possible solution. Upon experiencing abdominal pain, he went to a highly respected university hospital. After laboratory testing and observation, his attending doctor concluded the problem wasn’t serious—a gallstone at worst. He told the patient to go home and return for outpatient workup.

The professor wasn’t convinced. Fearing that the medical problem was severe, the professor logged onto ChatGPT (a generative AI technology) and entered his symptoms. The application concluded that there was a 40% chance of a ruptured appendix. The doctor reluctantly ordered an MRI, which confirmed ChatGPT’s diagnosis.

Future generations of generative AI, pretrained with data from people’s electronic health records and fed with information about cognitive biases, will be able to spot these types of errors when they occur.

Deviation from standard practice will result in alerts, bringing cognitive errors to consciousness, thus reducing the likelihood of misdiagnosis and medical error. Rather than resisting this kind of objective second opinion, I hope clinicians will embrace it. The opportunity to prevent harm would constitute a major advance in medical care.

Opinion:  The AI revolution in health care is already here

Pay attention to the media coverage around artificial intelligence, and it’s easy to get the sense that technologies such as chatbots pose an “existential crisis” to everything from the economy to democracy.

These threats are real, and proactive regulation is crucial. But it’s also important to highlight AI’s many positive applications, especially in health care.

Consider the Mayo Clinic, the largest integrated, nonprofit medical practice in the world, which has created more than 160 AI algorithms in cardiology, neurology, radiology and other specialties. Forty of those have already been deployed in patient care.

To better understand how AI is used in medicine, I spoke with John Halamka, a physician trained in medical informatics who is president of Mayo Clinic Platform. As he explained to me, “AI is just the simulation of human intelligence via machines.”

Halamka distinguished between predictive and generative AI. The former involves mathematical models that use patterns from the past to predict the future; the latter uses text or images to generate a sort of human-like interaction.

It’s that first type that’s most valuable to medicine today. As Halamka described, predictive AI can look at the experiences of millions of patients and their illnesses to help answer a simple question: “What can we do to ensure that you have the best journey possible with the fewest potholes along the way?”

For instance, let’s say someone is diagnosed with Type 2 diabetes. Instead of giving generic recommendations for anyone with the condition, an algorithm can predict the best care plan for that patient using their age, geography, racial and ethnic background, existing medical conditions and nutritional habits.

This kind of patient-centered treatment isn’t new; physicians have long been individualizing recommendations. So in this sense, predictive AI is just one more tool to aid in clinical decision-making.

The quality of the algorithm depends on the quantity and diversity of data. I was astounded to learn that the Mayo Clinic team has signed data-partnering agreements with clinical systems across the United States and globally, including in Canada, Brazil and Israel. By the end of 2023, Halamka expects the network of organizations to encompass more than 100 million patients whose medical records, with identifying information removed, will be used to improve care for others.

Predictive AI can also augment diagnoses. For example, to detect colon cancer, standard practice is for gastroenterologists to perform a colonoscopy and manually identify and remove precancerous polyps. But some studies estimate that 1 in 4 cancerous lesions are missed during screening colonoscopies.

Predictive AI can dramatically improve detection. The software has been “trained” to identify polyps by looking at many pictures of them, and when it detects one during the colonoscopy, it alerts the physician to take a closer look. One randomized controlled trial at eight centers in the United States, Britain and Italy found that using such AI reduced the miss rate of potentially cancerous lesions by more than half, from 32.4 percent to 15.5 percent.

Halamka made a provocative statement that within the next five years, it could be considered malpractice not to use AI in colorectal cancer screening.

But he was also careful to point out that “it’s not AI replacing a doctor, but AI augmenting a doctor to provide additional insight.” There is so much unmet need that technology won’t reduce the need for health-care providers; instead, he argued, “we’ll be able to see more patients and across more geographies.”

Generative AI, on the other hand, is a “completely different kind of animal,” Halamka said. Some tools, such as ChatGPT, are trained on un-curated materials found on the internet. Because the inputs themselves contain inaccurate information, the models can produce inappropriate and misleading text. Moreover, whereas the quality of predictive AI can be measured, generative AI models produce different answers to the same question each time, making validation more challenging.

At the moment, there are too many concerns over quality and accuracy for generative AI to direct clinical care. Still, it holds tremendous potential as a method to reduce administrative burden. Some clinics are already using apps that automatically transcribe a patient’s visit. Instead of creating the medical record from scratch, physicians would edit the transcript, saving them valuable time.

Though Halamka is clearly a proponent of AI’s use in medicine, he urges federal oversight. Just as the Food and Drug Administration vets new medications, there should be a process to independently validate algorithms and share results publicly. Moreover, Halamka is championing efforts to prevent the perpetuation of existing biases in health care in AI applications.

This is a cautious and thoughtful approach. Just like any tool, AI must be studied rigorously and deployed carefully, while heeding the warning to “first, do no harm.”

Nevertheless, AI holds incredible promise to make health care safer, more accessible and more equitable.

In scramble to respond to Covid-19, hospitals turned to models with high risk of bias

In scramble to respond to Covid-19, hospitals turned to models with high  risk of bias - MedCity News

Of 26 health systems surveyed by MedCity News, nearly half used automated tools to respond to the Covid-19 pandemic, but none of them were regulated. Even as some hospitals continued using these algorithms, experts cautioned against their use in high-stakes decisions.

A year ago, Michigan Medicine faced a dire situation. In March of 2020, the health system predicted it would have three times as many patients as its 1,000-bed capacity — and that was the best-case scenario. Hospital leadership prepared for this grim prediction by opening a field hospital in a nearby indoor track facility, where patients could go if they were stable, but still needed hospital care. But they faced another predicament: How would they decide who to send there?

Two weeks before the field hospital was set to open, Michigan Medicine decided to use a risk model developed by Epic Systems to flag patients at risk of deterioration. Patients were given a score of 0 to 100, intended to help care teams determine if they might need an ICU bed in the near future. Although the model wasn’t developed specifically for Covid-19 patients, it was the best option available at the time, said Dr. Karandeep Singh, an assistant professor of learning health sciences at the University of Michigan and chair of Michigan Medicine’s clinical intelligence committee. But there was no peer-reviewed research to show how well it actually worked.

Researchers tested it on over 300 Covid-19 patients between March and May. They were looking for scores that would indicate when patients would need to go to the ICU, and if there was a point where patients almost certainly wouldn’t need intensive care.

“We did find a threshold where if you remained below that threshold, 90% of patients wouldn’t need to go to the ICU,” Singh said. “Is that enough to make a decision on? We didn’t think so.”

But if the number of patients were to far exceed the health system’s capacity, it would be helpful to have some way to assist with those decisions.

“It was something that we definitely thought about implementing if that day were to come,” he said in a February interview.

Thankfully, that day never came.

The survey
Michigan Medicine is one of 80 hospitals contacted by MedCity News between January and April in a survey of decision-support systems implemented during the pandemic. 
Of the 26 respondents, 12 used machine learning tools or automated decision systems as part of their pandemic response. Larger hospitals and academic medical centers used them more frequently.

Faced with scarcities in testing, masks, hospital beds and vaccines, several of the hospitals turned to models as they prepared for difficult decisions. The deterioration index created by Epic was one of the most widely implemented — more than 100 hospitals are currently using it — but in many cases, hospitals also formulated their own algorithms.

They built models to predict which patients were most likely to test positive when shortages of swabs and reagents backlogged tests early in the pandemic. Others developed risk-scoring tools to help determine who should be contacted first for monoclonal antibody treatment, or which Covid patients should be enrolled in at-home monitoring programs.

MedCity News also interviewed hospitals on their processes for evaluating software tools to ensure they are accurate and unbiased. Currently, the FDA does not require some clinical decision-support systems to be cleared as medical devices, leaving the developers of these tools and the hospitals that implement them responsible for vetting them.

Among the hospitals that published efficacy data, some of the models were only evaluated through retrospective studies. This can pose a challenge in figuring out how clinicians actually use them in practice, and how well they work in real time. And while some of the hospitals tested whether the models were accurate across different groups of patients — such as people of a certain race, gender or location — this practice wasn’t universal.

As more companies spin up these models, researchers cautioned that they need to be designed and implemented carefully, to ensure they don’t yield biased results.

An ongoing review of more than 200 Covid-19 risk-prediction models found that the majority had a high risk of bias, meaning the data they were trained on might not represent the real world.

“It’s that very careful and non-trivial process of defining exactly what we want the algorithm to be doing,” said Ziad Obermeyer, an associate professor of health policy and management at UC Berkeley who studies machine learning in healthcare. “I think an optimistic view is that the pandemic functions as a wakeup call for us to be a lot more careful in all of the ways we’ve talked about with how we build algorithms, how we evaluate them, and what we want them to do.”

Algorithms can’t be a proxy for tough decisions
Concerns about bias are not new to healthcare. In a paper published two years ago
, Obermeyer found a tool used by several hospitals to prioritize high-risk patients for additional care resources was biased against Black patients. By equating patients’ health needs with the cost of care, the developers built an algorithm that yielded discriminatory results.

More recently, a rule-based system developed by Stanford Medicine to determine who would get the Covid-19 vaccine first ended up prioritizing administrators and doctors who were seeing patients remotely, leaving out most of its 1,300 residents who had been working on the front lines. After an uproar, the university attributed the errors to a “complex algorithm,” though there was no machine learning involved.

Both examples highlight the importance of thinking through what exactly a model is designed to do — and not using them as a proxy to avoid the hard questions.

“The Stanford thing was another example of, we wanted the algorithm to do A, but we told it to do B. I think many health systems are doing something similar,” Obermeyer said. “You want to give the vaccine first to people who need it the most — how do we measure that?”

The urgency that the pandemic created was a complicating factor.  With little information and few proven systems to work with in the beginning, health systems began throwing ideas at the wall to see what works. One expert questioned whether people might be abdicating some responsibility to these tools.

“Hard decisions are being made at hospitals all the time, especially in this space, but I’m worried about algorithms being the idea of where the responsibility gets shifted,” said Varoon Mathur, a technology fellow at NYU’s AI Now Institute, in a Zoom interview. “Tough decisions are going to be made, I don’t think there are any doubts about that. But what are those tough decisions? We don’t actually name what constraints we’re hitting up against.”

The wild, wild west
There currently is no gold standard for how hospitals should implement machine learning tools, and little regulatory oversight for models designed to support physicians’ decisions, resulting in an environment that Mathur described as the “wild, wild west.”

How these systems were used varied significantly from hospital to hospital.

Early in the pandemic, Cleveland Clinic used a model to predict which patients were most likely to test positive for the virus as tests were limited. Researchers developed it using health record data from more than 11,000 patients in Ohio and Florida, including 818 who tested positive for Covid-19. Later, they created a similar risk calculator to determine which patients were most likely to be hospitalized for Covid-19, which was used to prioritize which patients would be contacted daily as part of an at-home monitoring program.

Initially, anyone who tested positive for Covid-19 could enroll in this program, but as cases began to tick up, “you could see how quickly the nurses and care managers who were running this program were overwhelmed,” said Dr. Lara Jehi, Chief Research Information Officer at Cleveland Clinic. “When you had thousands of patients who tested positive, how could you contact all of them?”

While the tool included dozens of factors, such as a patient’s age, sex, BMI, zip code, and whether they smoked or got their flu shot, it’s also worth noting that demographic information significantly changed the results. For example, a patient’s race “far outweighs” any medical comorbidity when used by the tool to estimate hospitalization risk, according to a paper published in Plos One.  Cleveland Clinic recently made the model available to other health systems.

Others, like Stanford Health Care and 731-bed Santa Clara County Medical Center, started using Epic’s clinical deterioration index before developing their own Covid-specific risk models. At one point, Stanford developed its own risk-scoring tool, which was built using past data from other patients who had similar respiratory diseases, such as the flu, pneumonia, or acute respiratory distress syndrome. It was designed to predict which patients would need ventilation within two days, and someone’s risk of dying from the disease at the time of admission.

Stanford tested the model to see how it worked on retrospective data from 159 patients that were hospitalized with Covid-19, and cross-validated it with Salt Lake City-based Intermountain Healthcare, a process that took several months. Although this gave some additional assurance — Salt Lake City and Palo Alto have very different populations, smoking rates and demographics — it still wasn’t representative of some patient groups across the U.S.

“Ideally, what we would want to do is run the model specifically on different populations, like on African Americans or Hispanics and see how it performs to ensure it’s performing the same for different groups,” Tina Hernandez-Boussard, an associate professor of medicine, biomedical data science and surgery at Stanford, said in a February interview. “That’s something we’re actively seeking. Our numbers are still a little low to do that right now.”

Stanford planned to implement the model earlier this year, but ultimately tabled it as Covid-19 cases fell.

‘The target is moving so rapidly’
Although large medical centers were more likely to have implemented automated systems, there were a few notable holdouts. For example, UC San Francisco Health, Duke Health and Dignity Health all said they opted not to use risk-prediction models or other machine learning tools in their pandemic responses.

“It’s pretty wild out there and I’ll be honest with you —  the dynamics are changing so rapidly,” said Dr. Erich Huang, chief officer for data quality at Duke Health and director of Duke Forge. “You might have a model that makes sense for the conditions of last month but do they make sense for the conditions of next month?”

That’s especially true as new variants spread across the U.S., and more adults are vaccinated, changing the nature and pace of the disease. But other, less obvious factors might also affect the data. For instance, Huang pointed to big differences in social mobility across the state of North Carolina, and whether people complied with local restrictions. Differing social and demographic factors across communities, such as where people work and whether they have health insurance, can also affect how a model performs.

“There are so many different axes of variability, I’d feel hard pressed to be comfortable using machine learning or AI at this point in time,” he said. “We need to be careful and understand the stakes of what we’re doing, especially in healthcare.”

Leadership at one of the largest public hospitals in the U.S., 600-bed LAC+USC Medical Center in Los Angeles, also steered away from using predictive models, even as it faced an alarming surge in cases over the winter months.

At most, the hospital used alerts to remind physicians to wear protective equipment when a patient has tested positive for Covid-19.

“My impression is that the industry is not anywhere near ready to deploy fully automated stuff just because of the risks involved,” said Dr. Phillip Gruber, LAC+USC’s chief medical information officer. “Our institution and a lot of institutions in our region are still focused on core competencies. We have to be good stewards of taxpayer dollars.”

When the data itself is biased
Developers have to contend with the fact that any model developed in healthcare will be biased, because the data itself is biased; how people access and interact with health systems in the U.S. is fundamentally unequal.

How that information is recorded in electronic health record systems (EHR) can also be a source of bias, NYU’s Mathur said. People don’t always self-report their race or ethnicity in a way that fits neatly within the parameters of an EHR. Not everyone trusts health systems, and many people struggle to even access care in the first place.

“Demographic variables are not going to be sharply nuanced. Even if they are… in my opinion, they’re not clean enough or good enough to be nuanced into a model,” Mathur said.

The information hospitals have had to work with during the pandemic is particularly messy. Differences in testing access and missing demographic data also affect how resources are distributed and other responses to the pandemic.

“It’s very striking because everything we know about the pandemic is viewed through the lens of number of cases or number of deaths,” UC Berkeley’s Obermeyer said. “But all of that depends on access to testing.”

At the hospital level, internal data wouldn’t be enough to truly follow whether an algorithm to predict adverse events from Covid-19 was actually working. Developers would have to look at social security data on mortality, or whether the patient went to another hospital, to track down what happened.

“What about the people a physician sends home —  if they die and don’t come back?” he said.

Researchers at Mount Sinai Health System tested a machine learning tool to predict critical events in Covid-19 patients —  such as dialysis, intubation or ICU admission — to ensure it worked across different patient demographics. But they still ran into their own limitations, even though the New York-based hospital system serves a diverse group of patients.

They tested how the model performed across Mount Sinai’s different hospitals. In some cases, when the model wasn’t very robust, it yielded different results, said Benjamin Glicksberg, an assistant professor of genetics and genomic sciences at Mount Sinai and a member of its Hasso Plattner Institute for Digital Health.

They also tested how it worked in different subgroups of patients to ensure it didn’t perform disproportionately better for patients from one demographic.

“If there’s a bias in the data going in, there’s almost certainly going to be a bias in the data coming out of it,” he said in a Zoom interview. “Unfortunately, I think it’s going to be a matter of having more information that can approximate these external factors that may drive these discrepancies. A lot of that is social determinants of health, which are not captured well in the EHR. That’s going to be critical for how we assess model fairness.”

Even after checking for whether a model yields fair and accurate results, the work isn’t done yet. Hospitals must continue to validate continuously to ensure they’re still working as intended — especially in a situation as fast-moving as a pandemic.

A bigger role for regulators
All of this is stirring up a broader discussion about how much of a role regulators should have in how decision-support systems are implemented.

Currently, the FDA does not require most software that provides diagnosis or treatment recommendations to clinicians to be regulated as a medical device. Even software tools that have been cleared by the agency lack critical information on how they perform across different patient demographics. 

Of the hospitals surveyed by MedCity News, none of the models they developed had been cleared by the FDA, and most of the external tools they implemented also hadn’t gone through any regulatory review.

In January, the FDA shared an action plan for regulating AI as a medical device. Although most of the concrete plans were around how to regulate algorithms that adapt over time, the agency also indicated it was thinking about best practices, transparency, and methods to evaluate algorithms for bias and robustness.

More recently, the Federal Trade Commission warned that it could crack down on AI bias, citing a paper that AI could worsen existing healthcare disparities if bias is not addressed.

“My experience suggests that most models are put into practice with very little evidence of their effects on outcomes because they are presumed to work, or at least to be more efficient than other decision-making processes,” Kellie Owens, a researcher for Data & Society, a nonprofit that studies the social implications of technology, wrote in an email. “I think we still need to develop better ways to conduct algorithmic risk assessments in medicine. I’d like to see the FDA take a much larger role in regulating AI and machine learning models before their implementation.”

Developers should also ask themselves if the communities they’re serving have a say in how the system is built, or whether it is needed in the first place. The majority of hospitals surveyed did not share with patients if a model was used in their care or involve patients in the development process.

In some cases, the best option might be the simplest one: don’t build.

In the meantime, hospitals are left to sift through existing published data, preprints and vendor promises to decide on the best option. To date, Michigan Medicine’s paper is still the only one that has been published on Epic’s Deterioration Index.

Care teams there used Epic’s score as a support tool for its rapid response teams to check in on patients. But the health system was also looking at other options.

“The short game was that we had to go with the score we had,” Singh said. “The longer game was, Epic’s deterioration index is proprietary. That raises questions about what is in it.”