Determining who is at risk from a disease is important in order to protect vulnerable subpopulations during an outbreak. We are currently in a SARS-COV-2 (commonly referred to as COVID-19) pandemic which has had a massive impact across the world, with some communities and individuals seen to have a higher risk of severe outcomes and death from the disease compared to others. These risks are compounded for people of lower socioeconomic status, those who have limited access to health care, higher rates of chronic diseases, such as hypertension, diabetes (type-2), obesity, likely due to the chronic stress of these types of living conditions. Essential workers are also at a higher risk of COVID-19 due to having higher rates of exposure due to the nature of their work. In this study we determine the important features of the pandemic in California in terms of cumulative cases and deaths per 100,000 of population up to the date of 5 July, 2021 (the date of analysis) using Pearson correlation coefficients between population demographic features and cumulative cases and deaths. The most highly correlated features, based on the absolute value of their Pearson Correlation Coefficients in relation to cases or deaths per 100,000, were used to create regression models in two ways: using the top 5 features and using the top 20 features filtered out to limit interactions between features. These models were used to determine a) the most significant features out of these subsets and b) features that approximate different potential forces on COVID-19 cases and deaths (especially in the case of the latter set). Additionally, co-correlations, defined as demographic features not within a given input feature set for the regression models but which are strongly correlated with the features included within, were calculated for all features.
翻译:确定疾病风险的重要性在于保护在疫情爆发期间的易感人群。我们目前正处于 SARS-COV-2 疫情大流行中,其对全球产生了巨大影响,与其他人相比,一些社区和个人面临更高的严重后果和死亡风险。这些风险对低社会经济地位人群、医疗保健有限、患有慢性疾病(如高血压、2 型糖尿病和肥胖症)的人群影响更大,这可能是由于这些类型的生活条件导致的长期压力所致。基本工作者也因其工作性质而面临更高的 COVID-19 风险。在这项研究中,我们使用人口统计特征与每 10 万人口的累计病例和死亡之间的 Pearson 相关系数来确定加利福尼亚州疫情的重要特征,分析日期为 2021 年 7 月 5 日(分析日)。根据与每 10 万人口的病例或死亡相关的 Pearson 相关系数的绝对值,选择最高相关性的特征,使用这些特征创建回归模型。有两种方式:使用前 5 个特征和使用前 20 个特征进行筛选,以限制特征之间的相互作用。这些模型用于确定 a)这些子集最显著的特征和 b)近似于 COVID-19 病例和死亡可能的不同推动力的特征(尤其是在后者的情况下)。此外,计算了所有功能的共相关性,定义为不在给定输入特征集中但与所包含的特征强相关的人口统计特征。