There are many sources of data giving information about the number of SARS-CoV-2 infections in the population, but all have major drawbacks, including biases and delayed reporting. For example, the number of confirmed cases largely underestimates the number of infections, deaths lag infections substantially, while test positivity rates tend to greatly overestimate prevalence. Representative random prevalence surveys, the only putatively unbiased source, are sparse in time and space, and the results come with a big delay. Reliable estimates of population prevalence are necessary for understanding the spread of the virus and the effects of mitigation strategies. We develop a simple Bayesian framework to estimate viral prevalence by combining the main available data sources. It is based on a discrete-time SIR model with time-varying reproductive parameter. Our model includes likelihood components that incorporate data of deaths due to the virus, confirmed cases, and the number of tests administered on each day. We anchor our inference with data from random sample testing surveys in Indiana and Ohio. We use the results from these two states to calibrate the model on positive test counts and proceed to estimate the infection fatality rate and the number of new infections on each day in each state in the USA. We estimate the extent to which reported COVID cases have underestimated true infection counts, which was large, especially in the first months of the pandemic. We explore the implications of our results for progress towards herd immunity.
翻译:有关人口中非典-COV-2感染数量的信息有许多资料来源,但都存在重大缺陷,包括偏见和延迟报告。例如,已证实病例的数量大大低估了感染人数,死亡滞后感染人数,而测试的阳性率往往大大高估流行率。代表随机流行率调查(唯一推定的不公正来源)在时间和空间上都稀少,结果也大为拖延。可靠的人口流行率估计数对于了解病毒的传播和缓解战略的影响是必要的。我们制定了简单的巴伊西亚框架,以综合现有主要数据来源来估计病毒流行率。它基于一个离散时间的SIR模型,具有时间变化的生殖参数。我们的模型包括可能包含因病毒死亡的数据、经证实的病例和每天进行的检测次数。我们用印第安纳州和俄亥俄州随机抽样检测调查的数据来判断我们的推理。我们用这两个州的结果来校正测试模型,并着手估计感染死亡率和新感染病例的数量。我们每个州报告的艾滋病感染率估计都比重。