[ad_1]
Tools may generate inaccurate and unsafe advice if their models have been trained on inadequate or unrepresentative (biased) data,24 used in an inappropriate clinical setting or context, misinterpret minor data set shifts that clinicians know to ignore or account for (ie, changes in patient, clinical practice or equipment characteristics), or which under-sense (too few alerts resulting in harm) or over-sense (too many causing alert fatigue) (box 2). Data required to operate the tool must be accurate, representative and readily accessible when needed and models must be resilient to class imbalance (ie, outcomes being predicted are infrequent) and label leakage (ie, using image background or other artefacts to make predictions rather than clinically relevant features).
Calibrating artificial intelligence tools in optimising clinical utility
A failure to recognise clinical deterioration in the hospital due to sepsis or other potentially life-threatening conditions is a leading cause of in-hospital death and unplanned transfers to intensive care units. Early warning systems (EWS) can predict a patient’s risk of clinical deterioration, and potentially allow clinicians to intervene earlier. Current EWS comprise simple prediction rules to estimate risk based on a combination of a small number of input variables, usually fewer than 10, such as vital signs. The rules only offer a narrow time window, usually less than 12 hours, to trigger an alert prior to overt deterioration that activates a medical emergency team response. The rules are also prone to false-positive alerts which induce alert fatigue. An EWS that uses machine learning could make more accurate and timely predictions given its ability to input hundreds of variables.
The ideal prediction tool should miss very few cases of clinical deterioration (high sensitivity) and not overcall cases with no deterioration (high specificity). Clinicians may decide the tool should aim for no more than two false alerts for every true positive alert in order to balance the time required to assess alert patients with other competing demands. The data scientists would then set the threshold for categorising patients as high risk at a positive predictive value of around 30%. At this threshold, based on historical data, the sensitivity may be only 50%, but clinicians may decide this would be a useful proportion of cases to detect. Clinicians may find the tool more useful if it can predict events within the following 48 hours. A shorter window would not leave enough time to intervene, and a longer window would make it difficult for clinicians to know how to respond.
In adjusting sensitivity thresholds and striking the right balance between clinician workload and patient safety, input from clinician users is required. Such adjustments will also vary according to the criticality of the event being predicted, for example, pressure sores versus septic shock.
For all these reasons, rigorous external validation of acceptable model performance when used in different populations by different clinicians25 is paramount, together with an ability to retrain models on local data if performance is found to be suboptimal. Importantly, clinicians want to know when and for whom a tool should, and should not, be used (ie, clear, transparent task specification). Ideally, information should be forthcoming about how the model was trained, who was included in the data set, what its performance is like, who funded its development and what assumptions or conditions should be satisfied for its use.26 Tool developers should share model code and input features to allow other researchers to reproduce and reconfirm model performance using different data sets from different settings.
[ad_2]
Source link



