Diagnostic tests are almost never perfect. Studies quantifying their performance use knowledge of the true health status, measured with a reference diagnostic test. Researchers commonly assume that the reference test is perfect, which is not the case in practice. When the assumption fails, conventional studies identify "apparent" performance or performance with respect to the reference, but not true performance. This paper provides the smallest possible bounds on the measures of true performance - sensitivity (true positive rate) and specificity (true negative rate), or equivalently false positive and negative rates, in standard settings. Implied bounds on policy-relevant parameters are derived: 1) Prevalence in screened populations; 2) Predictive values. Methods for inference based on moment inequalities are used to construct uniformly consistent confidence sets in level over a relevant family of data distributions. Emergency Use Authorization (EUA) and independent study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds can be very informative. Analysis reveals that the estimated false negative rates for symptomatic and asymptomatic patients are up to 3.89 and 5.42 times higher than the frequently cited "apparent" false negative rate.
翻译:诊断性测试几乎从来就不是十全十美。用参考诊断性测试来衡量,用其性能使用真实健康状况知识的量化研究,用参考性诊断性测试衡量。研究人员通常认为参考性测试是完美的,实际上并非如此。假设失败时,常规研究确定参考性测试的“明显”性能或性能,但不是真正的性能。本文提供了在标准环境中真实性性能测量的最小可能的界限――敏感度(实际正率)和特殊性(实际负率),或相等于假正负率。根据政策相关参数得出的隐含界限:(1) 受检查人群的流行率;(2) 预测性值。基于时间不平等的推断方法被用来在相关数据分布系列上构建一致一致的可信度。紧急使用授权(EUA)和BinaxNOW COVID-19抗原测试的独立研究数据表明,这些界限可以非常丰富。分析表明,对症状和无症状患者的估计负率为3.89和5.42倍于经常引用的“显性负率”。