Formulating accurate and robust classification strategies is a key challenge of developing diagnostic and antibody tests. Methods that do not explicitly account for disease prevalence and uncertainty therein can lead to significant classification errors. We present a novel method that leverages optimal decision theory to address this problem. As a preliminary step, we develop an analysis that uses an assumed prevalence and conditional probability models of diagnostic measurement outcomes to define optimal (in the sense of minimizing rates of false positives and false negatives) classification domains. Critically, we demonstrate how this strategy can be generalized to a setting in which the prevalence is unknown by either: (i) defining a third class of hold-out samples that require further testing; or (ii) using an adaptive algorithm to estimate prevalence prior to defining classification domains. We also provide examples for a recently published SARS-CoV-2 serology test and discuss how measurement uncertainty (e.g. associated with instrumentation) can be incorporated into the analysis. We find that our new strategy decreases classification error by up to a decade relative to more traditional methods based on confidence intervals. Moreover, it establishes a theoretical foundation for generalizing techniques such as receiver operating characteristics (ROC) by connecting them to the broader field of optimization.
翻译:制定准确和稳健的分类战略是发展诊断和抗体测试的关键挑战。没有明确说明疾病流行和不确定性的方法可能导致重大的分类错误。我们提出了一个新颖的方法,利用最佳决策理论来解决这一问题。作为初步步骤,我们开发了一种假设流行和有条件的诊断性测量结果概率模型,以确定最佳的分类领域(即最大限度地降低假阳性和假负值的比率)。关键地说,我们证明如何将这一战略推广到以下两种情况下的流行程度未知的环境:(一) 界定需要进一步测试的第三类搁置样品;或(二) 在界定分类领域之前使用适应性算法来估计流行程度。我们还为最近公布的SARS-COV-2生物学测试提供实例,并讨论如何将测量不确定性(例如与仪器相关)纳入分析中。我们发现,我们的新战略将分类错误降低到十年,与基于信任间隔的较传统方法相比减少。此外,它还为将接收器操作特性(ROC)等通用技术与更广泛的优化领域联系起来确立了理论基础。