Introduction
Machine learning (ML) algorithms are evolving to tackle increasingly complex clinical challenges.1 The general appeal is that clinical practice will likely benefit from algorithm-assisted decision-making by optimising clinical workflows, diagnostic interventions and enhancing personalised precision care. Some insights derived from algorithms will likely assist in clinical decision-making where patients’ lives are at risk. Therefore, ML algorithms that become integrated as part of decision-making in clinical practice must be robust, reliable and unbiased.
Healthcare disparities resulting from discrimination and bias are pervasive. These can be encoded in algorithms trained on clinical data from electronic health records (EHRs). Existing inequities can be perpetuated or even magnified by algorithms developed to inform decision-making due to bias in data used for training the models, bias introduced during model development or deployment and postdeployment monitoring. Any of these can result in decision-making that can be discriminatory and harmful to socially disadvantaged population groups inadequately represented in the data.
There is overwhelming evidence that race or ethnicity impacts clinical decision-making.2 Hispanic patients seen by non-Hispanic providers received breast and colorectal cancer screening at higher rates than Hispanic patients seen by Hispanic providers.3 Greenwood and colleagues reported a 58% reduction in mortality of Black newborns when under the care of Black physicians compared with White physicians.4 Despite reporting greater pain and pain-related disability, minority patients are more likely to receive inadequate pain treatment compared with White patients.5 6 Treatment variation across race or ethnicity not explainable by patient or disease factors has been detailed in several studies, accompanied by evidence of unconscious bias in healthcare providers’ attitudes, expectations and behaviour.7–9 The presence of this type of bias in medical practice is further amplified if the discriminatory attitudes and behaviours are in turn modelled as disease mechanisms or decision support algorithms implemented by care providers. Brooks describes this phenomenon in an opinion piece that frames unconscious bias as a ‘silent curriculum’.10 Furthermore, a recent study illustrates racial bias in patients’ EHRs, showing that Black patients are 2.5 times more likely to have one or more negative descriptors compared with White patients.11
Bias embedded in data has been illustrated by Obermeyer et al where ‘at a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses’. The algorithm learnt to predict care costs, placing Black patients in the same risk category as a subset of White patients, while having considerably worse symptoms.12 To add to the severity of the problem, the number of Black patients who should have been referred for complex care was halved. Racial disparities have also been observed in blood pressure rates, with Black patients having higher blood pressure.13 14 Moreover, patients with darker skin colour are at greatest risk of hypovitaminosis D, which may result in microvascular endothelial dysfunction.15
With evidence of racial and ethnic bias in the intuition and judgement of healthcare providers, concern exists that algorithms trained to predict and optimise outcomes may use self-identified racial or ethnic information to inform decision-making even when these parameters are not used during training.
While bias in clinical scores is well-documented, less is known about bias in routinely collected, essential information for clinical decision-making, namely, vital signs. We investigate whether self-identified race or ethnicity can be learnt from four vital signs alone. We will use the term ethnicity to refer to self-identified race or ethnicity (https://www.ethnicity-facts-figures.service.gov.uk/style-guide/writing-about-ethnicity/).
Our results show models can predict patients’ ethnicity with an area under the curve (AUC) of 0.74 (±0.030) between White and Black patients, AUC of 0.74 (±0.030) between Hispanic and Black patients and AUC of 0.67 (±0.072) between Hispanic and White patients. If sensitive attributes are easily learnt from essential clinical data, it is of significant concern whether they can become an embedded part of clinical decision-making and treatment optimisation, leading to patient harm.
Our findings add to the growing body of evidence pointing to structural bias in healthcare systems, where even seemingly objective physiological data can perpetuate inequities. Another important interpretation of this finding is that physiological data might present a biased substrate for ML models. This dual interpretation emphasises the need for caution in using ML for clinical decision-making and the potential for physiology data to embed racial and ethnic biases, often rooted in how measurement devices are designed, tested or used in clinical settings. These biases can carry over into ML models trained on this data, further amplifying disparities in healthcare outcomes if not carefully addressed.