trendncart.com

Social vulnerability and initial COVID-19 community spread in the US South: a machine learning approach

Social vulnerability and initial COVID-19 community spread in the US South: a machine learning approach


Results

Table 1 provides sample characteristics of the 16 SVIs and COVID-19 cases and COVID-19 rates per 100 000 population after 30 days of the first COVID-19-positive cases in all counties in the 11 states of the US South (1086 counties). On average, 85.3 COVID-19 cases were reported after 30 days of the first reported case in a county, and a maximum of 6119 COVID-19 cases after 30 days of the first case in a county. Also, on average, 139.5 COVID-19 cases per 100 000 population were reported after 30 days of the first reported case, and a maximum of 4026.8 COVID-19 cases per 100 000 population after 30 days of the first case in a county.

Table 1

Descriptive statistics of the 16 SVIs and COVID-19 cases and COVID-19 rates per 100 000 population after 30 days of the first COVID-19-positive cases in all counties in the US South (1086 counties)

To evaluate the accuracy of our model, we tested the reliability of our predictions on 217 counties in the test data set. Goodness of fit and prediction evaluation (adjusted R-squared=0.59, root mean square error (RMSE)=92.36) indicates that the model was robust (online supplemental table A2). Online supplemental figure A5 also shows calibration plot of the predicted versus observed COVID-19 rates. Figure 2 shows the result of XGBoost gain relative importance. The percentage of mobile homes in counties is the most important feature, followed by population density per square mile and per capita income, in predicting the growth of COVID-19 within 30 days of the first case. The relative contributions of percentage of mobile homes, population density per square mile and per capita income to the model for generating predictions are 0.35, 0.12 and 0.12, respectively. Percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma have relative contributions of 0.10, 0.08 and 0.04, respectively. The percentage of overoccupied housing units and the percentage of institutionalised group quarters are the least important features in the model with relative gains of 0.003 and 0.002, respectively.

Extreme Gradient Boosting (XGBoost) gain relative importance. The measures are all reported as relative amounts and all sum up to 1.0.

The relative cover for percentage of mobile homes, population density per square mile and per capita income is 0.09, 0.12 and 0.07, respectively, which shows the relative proportion of counties in our sample that include these features across all the decision trees (online supplemental figure A3). Also, the relative cover for percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma is 0.7, 0.06 and 0.06, respectively. Relative frequency is calculated as the proportion of decision tree nodes that include a specific feature. The result of relative frequency shows that percentage of mobile homes, population density per square mile and per capita income occurred in 0.069, 0.093 and 0.079 of nodes within the trees of the model, respectively (online supplemental table A4). In addition, percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma accounted for 0.059, 0.085 and 0.061 of nodes in the trees of the model, respectively. Additional XGBoost feature importance matrix details can be found in online supplemental table A3. Figure 3 shows the results of the SHAP analysis. Population density per square mile, percentage of housing in structures with 10+ units and percentage of people below poverty had the most positive impact on the number of COVID-19 cases in a county. Also, per capita income and aged 17 and younger features had the most negative impact on the number of COVID-19 cases in a county.

Shapley additive explanations (SHAP) analysis results.

Online supplemental table A4 shows the result of XGBoost gain relative importance after 60 days of the county’s first COVID-19 case. The population density per square mile in counties is the most important feature in predicting the growth of COVID-19 within 60 days of the first case with a relative gain of 31.8%. This is followed by percentage of housing in structures with 10+ units and percentage of mobile homes, with relative gains of 30.4% and 11.2%, respectively. Also, percentage of people aged 65 and older, per capita income and percentage of people aged 5+ who speak limited English have relative contributions of 5.5%, 4.9% and 2.6%, respectively. Additional XGBoost feature importance matrix details can be found in online supplemental table A4.



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *