以缺少的绝对数据模拟疾病率 (Modeling rates of disease with missing categorical data)

Covariates like age, sex, and race/ethnicity provide invaluable insight to public health authorities trying to interpret surveillance data collected during a public health emergency such as the COVID-19 pandemic. However, the utility of such data is limited when many cases are missing key covariates. This problem is most concerning when this missingness likely depends on the values of missing covariates, i.e. they are not missing at random (NMAR). We propose a Bayesian parametric model that leverages joint information on spatial variation in the disease and covariate missingness processes and can accommodate both MAR and NMAR missingness. We show that the model is locally identifiable when the spatial distribution of the population covariates is known and observed cases can be associated with a spatial unit of observation. We also use a simulation study to investigate the model's finite-sample performance. We compare our model's performance on NMAR data against complete-case analysis and multiple imputation (MI), both of which are commonly used by public health researchers when confronted with missing categorical covariates. Finally, we model spatial variation in cumulative COVID-19 incidence in Wayne County, Michigan using data from the Michigan Department and Health and Human Services. The analysis suggests that population relative risk estimates by race during the early part of the COVID-19 pandemic in Michigan were understated for non-white residents compared to white residents when cases missing race were dropped or had these values imputed using MI.

翻译：年龄、性别和种族/族裔等变异性如年龄、性别、种族/族裔等,为公共卫生当局提供了宝贵的洞察力,帮助其解释在诸如COVID-19大流行等公共卫生紧急情况下收集的监测数据,然而,当许多案件失踪的关键共变体缺失时,这些数据的效用有限,但这种数据的效用有限。这个问题主要在于这种失踪可能取决于失踪共变体的价值,即它们并非随机失踪(NMAR)。我们建议采用巴耶西亚的参数模型,利用关于疾病和共变失踪过程空间变化和共变失踪过程的联合信息,并能够兼顾MAR和NMAR的失踪。我们表明,当人们知道人口共变数的空间分布,而且观察到的案件与空间观察单位有关时,这种模型的效用是可在当地识别的。我们还利用模拟研究研究研究来调查该模型的定式共变异性表现。我们比较了NMAR数据与全例分析以及多变化(MI)的模型,这两个模型通常被公共卫生研究人员在发现缺失的绝对共变异性时使用。最后,我们用模型来模拟CVI-19级累积的COVI事件在韦恩州非种族居民之间发生的累积性事件,密歇根基生物统计分析显示,密歇根基系的CRMIC根基系中的数据。