TrendNCart

open-url

Evaluating machine learning algorithms for predicting HIV status among young Thai men who have sex with men

[ad_1]

Discussion

Our study shows that ML models can aid HIV epidemic control using real-world data. Among the evaluated models, XGB with SMOTE-processed data achieved the highest accuracy in predicting HIV infection among young MSM. SHAP analysis identified key risk factors: earlier calendar year, older age at diagnosis and targeted HIV testing. These findings support the use of ML for HIV prediction, targeted interventions and prevention planning in Thailand. Integrating ML with real-world data can enhance prediction accuracy, inform public health strategies and optimise prevention efforts for KPs.

In our study, we used electronic health record data from the UHC programme, which showed low precision after applying SMOTE, likely due to the low prevalence of HIV and high engagement in HIV testing. Similar findings from studies in the USA25 and Denmark15 highlight the challenge of class imbalance in predicting HIV infection. To address this, we applied a weighted average approach, adjusting class weights inversely to their frequencies. This improved sensitivity to rare HIV-positive cases while maintaining balance with precision. In health settings, minimising both false positives and false negatives is crucial. The F1-score with weighted average accounts for both errors, making it a better metric for assessing model performance in HIV prediction than accuracy alone.26 Our model has the potential to support clinicians in identifying individuals at higher risk of acquiring HIV and linking them to preventive services such as the pre-exposure prophylaxis registry programme.27 Moving forward, real-world validation and optimisation of ML algorithms will be crucial to improving their practical application in public health settings. Our findings show that the XGB model achieved a weighted average F1-score of 0.77, demonstrating its ability to balance precision and recall while also achieving the highest AUC and sensitivity. These results are consistent with previous studies that highlight the effectiveness of ML algorithms in predicting HIV infection.16 A study that applied ML approaches to predict HIV and sexually transmitted infections (STIs) among MSM in Australia reported that Gradient Boosting achieved the highest AUC for HIV prediction (76.3%), followed by XGB, RF, deep learning and LR. These results highlight the advantages of ML approaches over traditional LR models in predicting HIV among MSM.17 More recently, a study from Zimbabwe also found that the XGB model demonstrated the highest performance in predicting HIV infection in the general population.18 Similarly, our findings align with a study conducted among MSM in Zhejiang, China, from 2018 to 2020, which applied SMOTE to address data set imbalance. That study reported an HIV infection rate of 6% and identified the RF model as the best-performing algorithm (recall=0.775, and AUC=0.942) when compared with conventional LR models.19 The usefulness of the SMOTE process for generating synthetic samples and addressing imbalanced biomedical data is further supported by findings from studies predicting HIV status in Danish registries.15 28

Additionally, our study observed an increasing trend in HIV testing among young MSM under the UHC programme in Thailand, accompanied by a decrease in the proportion of HIV infections during the study period. This decline is likely attributable to the effectiveness of the test-and-treat intervention and access to the PEPFAR programme in high-risk regions.29 30 Our findings indicate that HIV prevalence among MSM during the study period was lower than the prevalence reported in previous studies from China.19 31 Findings from conventional LR analysis further revealed that young MSM aged 20–24 years had higher odds of HIV infection, consistent with findings from studies conducted in China and Mozambique.11 31 Moreover, the recent advancements in HIV testing and scaling up of treatment underscore the commitment to achieve better treatment coverage and higher long-term viral suppression rates among people with HIV in Thailand.7 32 These significant factors were also reflected in the feature importance rankings identified in our study using the XGB model. Targeted efforts for MSM, the most affected group, are crucial to reducing HIV transmission and achieving global targets.

Our study supports the existing ML research that focused on predicting HIV infection. These studies collectively demonstrate the effectiveness of ML in detecting HIV infection among MSM through the real-world data sets. There were some limitations in our study. First, the inclusion of a limited number of variables, primarily demographic factors, restricted the utilisation of other important sexual behaviour factors such as condomless anal sex, substance abuse and history of STIs, which may have limited the depth of analysis in understanding the predictors for HIV infection among MSM. Second, ML models require large amounts of high-quality data and might not have exactly interpreted the association between outcome and predictors in detail as conventional regression analysis. Lastly, in low HIV positivity populations, ML models may have limited predictive power, especially with weak predictive features. Other sampling strategies, such as Adaptive Synthetic Sampling, were explored to oversample the minority class, but these resulted in lower accuracy. Additionally, adjusting decision thresholds (to better classify individuals at high risk for HIV), incorporating cost-sensitive learning (to minimise unnecessary testing and follow-up visits or prioritise detecting truer HIV-positive cases) and optimising the F1-score or balanced accuracy could further enhance the utility of the model in real-world HIV screening programmes. Incorporating ML in HIV prediction enhances disease understanding and supports public health goals with guiding targeted interventions.

[ad_2]

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *