[ad_1]
Discussion
In this study, we endeavored to predict alterations in laboratory parameters induced by the administration of labile blood products during haemorrhage associated with hepatic transplantation. Unfortunately, the metrics proposed by the various models are insufficient to permit their use in daily clinical practice by our anesthesiology colleagues without several precautions. Several factors may contribute to this outcome.
First, the selection of biological parameters we focused on—namely, PT, fibrinogen levels and aPTT—may be questionable in the specific context of hepatic transplantation. Pre-existing hepatic insufficiency leads to significant biochemical alterations and altered hepatic reactivity, which can modify systemic metabolism as well as the dynamics and consumption of various coagulation factors during the procedure.25–28
Second, the duration of the study and the nature of the database may have impacted the results. The data were derived from a patient population spanning over 20 years, during which there have been changes in patient demographics, surgical techniques, graft preservation methods and overall management strategies. A 20-year period for liver transplantation has its advantages and disadvantages. In fact, this extended period enabled us to include many patients and therefore to have more data to analyse and thus potentially increase the power of our study. On the other hand, we are faced with the risk of changes in management protocols, whether surgical, anaesthetic or transfusion related. However, insofar as each blood product was studied separately in the machine learning models, we felt that this disadvantage was reduced by our methodology.
Third, certain arbitrary choices made in our study design may have introduced biases. Specifically, the definition and creation of the ‘transfusion event’ could be questioned and might be refined in future research, especially in light of the results presented here. Questions arise as to whether the time period should be shortened or whether inclusion criteria should be modified to incorporate additional selection parameters. Similarly, the definition of this transfusion event is arbitrary and based on a few parameters. For example, we could have considered that a significant change in blood pressure defined the start and end of a haemorrhage, but also of a transfusion.
Furthermore, we did not study the administration of synthetic products simply for reasons of data collection, as we only had reliable information for products from the blood bank. Synthetic products were not correctly referenced in the patient files so that they could be used.
We must also consider the statistical metrics employed to select the most pertinent models during internal validation. Metrics such as the coefficient of determination (R²), RMSE and SD were used, following Taylor’s approach from nearly 30 years ago in well-established predictive fields. While these metrics were deemed appropriate for our study, it is important to recognise that fields like climatology use far larger datasets, and the applicability of these metrics in our context may be limited.22 29
Furthermore, the methodology used for model selection warrants discussion. Traditional machine learning model development involves testing several models in internal validation, followed by external validation, ultimately selecting a single model for predicting all variables. In contrast, we opted to retain, for each variable and time period, the most relevant model from internal validation to proceed to external validation. While this approach was intended to maximise accuracy for each variable–time pair, it did not yield satisfactory predictive performance.
Despite these limitations, our study possesses notable strengths. We have openly published all Python code used in data processing, selection, event creation, model development, selection of the most relevant models and metric calculations. This transparency facilitates reproducibility of our methodology across different settings.
Another strength is the relative homogeneity of the study populations. The focus on hepatic transplantation relies on cohorts where surgeons share similar training backgrounds, resulting in comparable surgical techniques that have evolved in parallel over time. This similarity enhances the comparability between internal and external validation populations.
Finally, although the dataset is substantial within the healthcare domain, it remains relatively small compared to machine learning studies in other scientific fields, such as climatology. Our findings illustrate that even large healthcare datasets may lack sufficient power to develop machine learning models robust enough for routine clinical application.
The results of our study do not support recommending the use of our developed models in routine clinical practice for the moment and without important precautions. However, they provide valuable insights into the potential contributions and necessary developments of machine learning as a clinical support tool. We advocate for the publication of model metrics when used by practitioners, allowing clinicians to exercise discretion based on the quality indicators provided. Just as clinicians critically evaluate outputs from diagnostic tools like ultrasound or hemodynamic monitors, they should assess the quality of predictions from machine learning models.
Our recommendation is to ensure maximal transparency with clinicians considering the use of these models. This involves providing not only the predicted values but also accompanying metrics or graphical representations of predictive quality, enabling clinicians to appropriately weigh the model outputs in their clinical decision-making.
[ad_2]
Source link




