The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has emphasized the importance and challenges of correctly interpreting antibody test results. Identification of positive and negative samples requires a classification strategy with low error rates, which is hard to achieve when the corresponding measurement values overlap. Additional uncertainty arises when classification schemes fail to account for complicated structure in data. We address these problems through a mathematical framework that combines high dimensional data modeling and optimal decision theory. Specifically, we show that appropriately increasing the dimension of data better separates positive and negative populations and reveals nuanced structure that can be described in terms of mathematical models. We combine these models with optimal decision theory to yield a classification scheme that better separates positive and negative samples relative to traditional methods such as confidence intervals (CIs) and receiver operating characteristics. We validate the usefulness of this approach in the context of a multiplex salivary SARS-CoV-2 immunoglobulin G assay dataset. This example illustrates how our analysis: (i) improves the assay accuracy (e.g. lowers classification errors by up to 42 % compared to CI methods); (ii) reduces the number of indeterminate samples when an inconclusive class is permissible (e.g. by 40 % compared to the original analysis of the example multiplex dataset); and (iii) decreases the number of antigens needed to classify samples. Our work showcases the power of mathematical modeling in diagnostic classification and highlights a method that can be adopted broadly in public health and clinical settings.
翻译:严重的急性急性呼吸系统综合征冠状病毒2(SARS-COV-2)大流行病强调了正确解释抗体测试结果的重要性和挑战。确定正和负样本需要采用低误率的分类战略,在相应的测量值重叠时很难实现。分类计划不能说明数据结构的复杂性,就会产生额外的不确定性。我们通过一个数学框架,结合高维数据模型和最佳决策理论,解决这些问题。具体地说,我们表明,适当增加数据的维度可以更好地区分正和负人口,并揭示可以用数学模型描述的细微结构。我们将这些模型与最佳决策理论结合起来,以产生一种比信任间隔(CI)和接收者操作特性等传统方法更好的正和负样本的分类方法。我们验证了这种方法在多氧化硅合成SARS-COV-2免疫球球球质 G类分析数据集背景下的有用性。我们的分析模型可以说明:(一) 改进分析的精确度(例如,将分类误差降低到42 % 的临床精确度,比CI类方法的精确度。 (二) 将数据分类的精度降低到原始样本中的精度。