To make informative public policy decisions in battling the ongoing COVID-19 pandemic, it is important to know the disease prevalence in a population. There are two intertwined difficulties in estimating this prevalence based on testing results from a group of subjects. First, the test is prone to measurement error with unknown sensitivity and specificity. Second, the prevalence tends to be low at the initial stage of the pandemic and we may not be able to determine if a positive test result is a false positive due to the imperfect specificity of the test. The statistical inference based on large sample approximation or conventional bootstrap may not be sufficiently reliable and yield confidence intervals that do not cover the true prevalence at the nominal level. In this paper, we have proposed a set of 95% confidence intervals, whose validity is guaranteed and doesn't depend on the sample size in the unweighted setting. For the weighted setting, the proposed inference is equivalent to a class of hybrid bootstrap methods, whose performance is also more robust to the sample size than those based on asymptotic approximations. The methods are used to reanalyze data from a study investigating the antibody prevalence in Santa Clara county, California, which was the motivating example of this research, in addition to several other seroprevalence studies where authors had tried to correct their estimates for test performance. Extensive simulation studies have been conducted to examine the finite-sample performance of the proposed confidence intervals.
翻译:为了在抗击正在发生的COVID-19大流行时作出知情的公共政策决定,必须了解人口中的疾病流行情况。根据一组对象的测试结果,在估计这种流行情况时有两个相互交织的困难。首先,测试容易发生测量错误,其敏感性和特殊性不明。第二,在流行病的初始阶段,该流行率往往较低,我们可能无法确定,由于测试的不完善性格,正试验结果是否为假正数。基于大样本近似或常规靴子陷阱的统计推断可能不够可靠,并产生不涵盖名义水平上的真正流行率的自信间隔。在本文件中,我们提出了一套95%的自信间隔,其有效性得到保证,并且不取决于未加权环境的抽样大小。在加权环境方面,拟议的推断相当于一组混合靴子捕捉方法,其性能也比基于无症状近似或常规靴子的样本尺寸要强。使用这种方法重新分析调查调查加州圣克拉拉克(Santa Clara)县的抗体流行率研究得出的数据,其有效性间隔期为95%,其有效性得到了保证,并不取决于未加权环境的抽样研究的大小。关于其业绩研究的精确性研究。在加利福尼亚州进行的模拟研究中,这些研究是用于模拟性研究的实验性研究的实验性研究的样本研究。