[ad_1]
Discussion
The models developed in our study are intended to complement the expertise of experienced clinicians, enhancing efficiency by predicting decisions which clinicians are likely to make or supporting them in making decisions more promptly. Therefore, interpretability is crucial to establish trust and to mitigate potential biases in the models.20 Many previous studies built black box models using ML algorithms like gradient boost, random forest and deep learning,8 10 12 14–16 lacking effective interpretability. In contrast, our models prioritised interpretability, enabling clinicians to cross-reference predictions against their own judgments, particularly when there are disparities, thereby enhancing clinical decision-making.
Our study emphasised meticulous data pre-processing and processing to construct interpretable models with robust performance. Unlike most existing studies,5 6 11 we included all available features at the beginning. After feature selection, 21 features were used in building the models, rather than just a few. A key contribution was the implementation of target encoding and a method for handling missing categorical and numeric values, which differed from conventional methods that replace missing values with the most frequent values or the mean or median.5 12 23 Moreover, while maintaining interpretability, our study is the first to effectively address the challenging U-shaped correlation issue, a factor overlooked by previous studies in developing generalised linear models in EDs.5 11 13 17 Due to these rigorous data pre-processing and processing efforts, the evaluation of eight ML algorithms revealed minor performance differences between interpretable lasso models and other black-box models, allowing us to select lasso for the final 12 models.
Using a model trained on data from one site at new sites comes at the cost of reduced accuracy, and global models trained on aggregate data from all sites are less accurate at individual sites than site-specific models.14 21 This phenomenon is frequently attributed to variations in operational practices, local patient populations and local care protocols among hospitals.9 14 Even within the same hospital, evaluating a model with later data can result in reduced accuracy due to concept drift—changes in data and model requirements over time.25 Previous studies lacked a transparent data analysis framework for developing ML models for LOS and DD predictions, with unclear procedures and critical omissions hindering the replication of these analyses using alternative datasets. Our study addresses these gaps by detailing the steps for data pre-processing, processing, modelling and validation. It represents the first to demonstrate a transparent data analysis framework for developing interpretable ML models with robust performance in predicting LOS and DD in EDs. This framework enables easy adaptation by other institutions using their own datasets, even if their features are different from ours. While it also makes it possible to address the concept drift through drift adaptation,26 resolving this issue in detail is beyond the scope of this study.
The models for predicting LOS and DD in the ED identified several key variables, which align well with current clinical practice. Notably, the feature ‘Average waiting time’ ranked among the top 10 features for all three models predicting binary LOS but was not significant for any model predicting binary DD. This suggests that waiting time influences a patient’s LOS but does not impact the DD made by clinicians. Additionally, models predicting binary LOS and DD at 120 min post-triage shared 8 out of 10 top features: ‘Age’, ‘Arrival transport mode’, ‘Intravenous catheter site’, ‘Intravenous catheter site assessment’, ‘Intravenous catheter size (gauge)’, ‘Order count’, ‘Postcode’ and ‘Triage complaint’. This indicates a natural association between LOS and DD. In simple terms, after accounting for variations in waiting time, a patient whose care is completed within 4 hours might be best discharged directly from the ED, one who requires 4–24 hours of care might be well suited to a short stay unit and one needing over 24 hours of care may be best served on an inpatient ward.
Using predicted probabilities empowers clinicians to optimise resource allocation and enhance patient flow efficiency. For instance, consider a patient highlighted in table 2 with a predicted probability of exceeding a 4-hour LOS at 0.973, notably surpassing the threshold of 0.429. An EPR marker indicating a high probability of LOS exceeding 4 hours can effectively direct a clinician’s attention to the right patient and relevant data through personalised interpretations, facilitating timely and instinctive DD. This approach is particularly valuable for communicating decisions to both patients and colleagues. Similarly, predicted probabilities of DD can provide valuable insights to aid decision-making processes. For more discussion, see online supplemental appendix A.
Limitations
Our study is subject to several limitations. First, while our data pre-processing and processing procedures were rigorous, they were not exhaustive. For example, some potentially useful features, such as ‘ICD10Code’, a hierarchical categorical feature with thousands of unique values that could capture disease types and severity, were not incorporated. Second, during data processing, extreme (invalid) outliers and real outliers are diminished in the first and last intervals. However, some real outliers may be powerful indicators of severe disease. Third, the performances of predictions for ternary LOS and DD may not be optimal. Fourth, our models make predictions at fixed time points (10, 60 and 120 min post-triage), which may limit their flexibility. Predictions at any time during the ED stay could potentially improve adaptability and precision. Finally, subgroup analyses related to age, ethnicity and specific diagnosis have not been conducted yet. These limitations present opportunities for further exploration in future research.
[ad_2]
Source link




