TrendNCart

society-logo-bcs-informatics

Evaluation of race/ethnicity-specific survival machine learning models for Hispanic and Black patients with breast cancer

[ad_1]

Abstract

Objectives Survival machine learning (ML) has been suggested as a useful approach for forecasting future events, but a growing concern exists that ML models have the potential to cause racial disparities through the data used to train them. This study aims to develop race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer to examine whether race/ethnicity-specific ML models outperform the general models trained with all races/ethnicity data.

Methods We used the data from the US National Cancer Institute’s Surveillance, Epidemiology and End Results programme registries. We developed the Hispanic-specific and black-specific models and compared them with the general model using the Cox proportional-hazards model, Gradient Boost Tree, survival tree and survival support vector machine.

Results A total of 322 348 female patients who had breast cancer diagnoses between 1 January 2000 and 31 December 2017 were identified. The race/ethnicity-specific models for Hispanic and black women consistently outperformed the general model when predicting the outcomes of specific race/ethnicity.

Discussion Accurately predicting the survival outcome of a patient is critical in determining treatment options and providing appropriate cancer care. The high-performing models developed in this study can contribute to providing individualised oncology care and improving the survival outcome of black and Hispanic women.

Conclusion Predicting the individualised survival outcome of breast cancer can provide the evidence necessary for determining treatment options and high-quality, patient-centred cancer care delivery for under-represented populations. Also, the race/ethnicity-specific ML models can mitigate representation bias and contribute to addressing health disparities.

Introduction

Breast cancer is the second-leading cause of cancer-related deaths in women in the USA, and it affects every ethnic group of women in the USA.1 2 However, there are racial and ethnic divides in cancer survival. Breast cancer is the most prevalent reason for cancer-related death in Hispanic women in the USA.3 Also, minority women, especially black women, have a higher mortality rate (26.8 per 100 000 women) even though white women (18.8 per 1 00 000 women) have higher cancer incidence.2 4 5 These facts indicate that the cancer survival rates need to be improved among Hispanic and black women, and various features contributing to breast cancer mortality should be understood to provide tailored intervention for enhanced survival.

Unlike traditional survival models that use a standard statistical method, survival machine learning (ML) has been suggested as a useful approach for learning the patterns from high-dimensional data and complex feature interactions for forecasting future events.6 This approach allows healthcare professionals to identify patients at high risk or predict those who need increased utilisation of healthcare services to proactively support and provide interventions necessary for the patients.7 However, a growing concern exists that ML models have the potential to cause racial disparities through the data used to train them.8 The ML model trained with the data representing general population would not contain sufficient number of participants from the minority population and is biased, resulting in inaccurate predictions for the minority group even if the overall accuracy is high.9 If the ML models trained with data poorly representative of minority groups are used in healthcare, they may exacerbate health disparities.10 To address such harmful effects, it is recommended to train an ML model with data that resemble the population that the model is intended to use.11 12 To the best of our knowledge, no study developed race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer.

Therefore, there is a need for race/ethnicity-specific survival ML models trained with the underrepresented populations to examine the feasibility of race/ethnicity-specific ML models that may outperform the general model trained with all races/ethnicity. Accurate prediction of the individualised outcome will enable tailored healthcare delivery and a better outcome for the underrepresented populations. This study aims to develop race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer to examine whether race/ethnicity-specific ML models outperform the models trained with the general population data when predicting the survival of Hispanic and black women diagnosed with breast cancer.

Discussion

Accurately predicting the survival outcome of a patient is critical in determining treatment options and providing appropriate cancer care. The ML approaches provide a robust way of predicting health outcomes using large data points with complex feature interactions. However, current ML models are often built with all races/ethnicity data, having the potential to have representation bias, and not tailored to each minority group. To date, race/ethnicity-specific survival ML models predicting the outcomes of the black and Hispanic women diagnosed with breast cancer are lacking. This study developed and evaluated race/ethnicity-specific survival ML models for black and Hispanic women with breast cancer and compared with the general population model. The high performing ML models developed in this study will be able to contribute to providing individualised oncology care and improving the survival outcome of specific populations, the black and Hispanic women. Also, it is a strength of our model that we used the patient data from more than 3 22 348 women in a large, population-based dataset from 2000 to 2017, including 59 204 (18.4%) Hispanic women and 20 073 (6.2%) Black women.

The sample population in this study showed that the black population had the highest death rate followed by the Hispanic and all races/ethnicity, supporting the findings from other literature.4 5 Also, the survival months for the black and Hispanic groups were low and they were younger compared with all races/ethnicity. It is congruent with the literature that young black women have higher breast cancer mortality than young white women,16 17 and the Latinas have the higher rates of more advanced cancer than non-Hispanic Whites.18 Also, breast cancer is more aggressive in younger women than older premenopausal women.19 Our study sample also showed that the Hispanic and black populations had higher percentage of poorly differentiated grade III cancer than overall populations. Poorly differentiated tumours lack normal features, tend to grow and spread faster and have a worse prognosis20; and these tumours expressed lower levels of oestrogen receptor.21 Our study sample showed likewise that Hispanic and black populations showed the lower percentage of oestrogen receptor positive status and progesterone receptor positive status than overall population. Studies have shown that young age breast cancer has more advanced stage at presentation, more grades and higher oestrogen receptor negativity.22

The result also showed that lower percentages of Hispanic and black populations had chemotherapy. Existing literature has shown that African American and Hispanic patients tend to experience diagnostic and treatment delays, which were related to worse survival outcomes.23 24 Perhaps lower percentages of Hispanic and black patients receiving chemotherapy were associated with the fewer survival months of the Hispanic and black populations in this study.

After the race/ethnicity-specific model development and evaluation, we observed that the general models trained with all races/ethnicity did not perform well when tested with specific races/ethnicity. That is, the race/ethnicity-specific survival ML models developed in this study consistently outperformed the general models when predicting the outcomes of specific race/ethnicity, addressing bias in ML. Especially, black and Hispanic-specific survival ML models using the Cox PH approach showed the best performance among the four ML models tested, showing that this model outperformed the other models in predicting the survival of specific race/ethnicity. Also, the ST model performance showed the highest difference between the race/ethnicity-specific model and the general model. This indicates that the ST model tends to overfit to a specific race/ethnicity compared with the other models. Our study demonstrated that a tailored ML model for each race/ethnicity is needed to better predict the patient survival than the general ML model using all races/ethnicity. By accurately forecasting a patient’s survival, healthcare professionals will be able to guide individualised treatment decisions and provide tailored interventions for the well-being of a cancer survivor.

It is worth noting that although the performance of the general model is not low, it was trained with the general population with an imbalanced portion of the underrepresented population, including the Hispanic and black populations. It was still meaningful to examine the feasibility of race/ethnicity-specific models since it is recommended to train an ML model with data resembling the people the model is intended to use to mitigate representation bias. Although the performance difference between the models was sometimes marginal depending on the algorithms, our race/ethnicity-specific models consistently outperformed the general model. It shows the potential to accurately predict individualised patient outcomes for quality care delivery for underrepresented populations and lead to alleviating health disparities.

There are several limitations to this study. The SEER database only includes the first course of treatment and do not have information on adjuvant therapy.25 This causes difficulties comparing the outcomes of the treatment sequence. To overcome this limitation, a comprehensive database that has more information on cancer treatment can be used as a future work to provide additional insights on the impact of treatment sequence. Also, the dataset did not include the human epidermal growth factor 2 receptor status, which is a critical tumour marker for breast cancer prognosis. The variable was missing because it was collected from 2010, but our data were dated from 2000. Incorporating this variable in the modelling will be needed in future work to provide more accurate predictions for patient outcomes.

Conclusion

This study has developed and evaluated accurate race/ethnicity-specific survival ML models for black and Hispanic women diagnosed with breast cancer. Predicting the individualised survival outcome of breast cancer can provide the evidence necessary for determining treatment options and high-quality, patient-centred cancer care delivery for underrepresented populations. Also, the race/ethnicity-specific ML models can mitigate representation bias and contribute to addressing health disparities.

[ad_2]

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *