Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increased testing capacity. In this paper, we demonstrate their impact on estimation of point prevalence and the effective reproduction number. We show that estimates based on the millions of molecular tests in the US has the same mean square error as a small simple random sample. To address this, a procedure is presented that combines case-count data and random samples over time to estimate selection propensities based on key covariate information. We then combine these selection propensities with epidemiological forecast models to construct a doubly robust estimation method that accounts for both measurement-error and selection bias. This method is then applied to estimate Indiana's prevalence using case-count, hospitalization, and death data with demographic information, a statewide random molecular sample collected from April 25--29th, and Facebook's COVID-19 symptom survey. We end with a series of recommendations based on the proposed methodology.
翻译:Corona病毒病例计算数据对政府政策产生了影响,并引发了大部分流行病学预测。有限测试被列举为COVID-19大流行最低信息背后的关键驱动因素。虽然扩大测试是值得称道的,但测量错误和选择偏差是限制我们对COVID-19大流行的理解的两个最大问题;两者都无法通过提高测试能力来充分解决。在本文件中,我们展示了它们对点流行率估计和有效复制数字的影响。我们显示,基于美国数百万分子测试的估计数与一个小的简单随机抽样样本具有相同的平均平方。为了解决这个问题,我们提出了一个程序,即将个案计算数据和随机样本在一段时间内结合起来,以根据关键同源信息估计选择的倾向。我们随后将这些选择的流行性与流行病学预测模型结合起来,以构建一个双重的稳健的估算方法,既考虑到测量错误,又考虑到选择偏差。然后采用这种方法使用案例统计、住院和死亡数据来估计印第安纳的流行率,并用人口信息来估算,这是从4月25日至29日收集的全州随机分子样本,以及Facebook的COVID-19症状调查。我们最后根据拟议的一系列建议,结束了一套方法。