An accurate multiclass classification strategy is crucial to interpreting antibody tests. However, traditional methods based on confidence intervals or receiver operating characteristics lack clear extensions to settings with more than two classes. We address this problem by developing a multiclass classification based on probabilistic modeling and optimal decision theory that minimizes the convex combination of false classification rates. The classification process is challenging when the relative fraction of the population in each class, or generalized prevalence, is unknown. Thus, we also develop a method for estimating the generalized prevalence of test data that is independent of classification. We validate our approach on serological data with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) na\"ive, previously infected, and vaccinated classes. Synthetic data are used to demonstrate that (i) prevalence estimates are unbiased and converge to true values and (ii) our procedure applies to arbitrary measurement dimensions. In contrast to the binary problem, the multiclass setting offers wide-reaching utility as the most general framework and provides new insight into prevalence estimation best practices.
翻译:准确的多级分类战略对于解释抗体测试至关重要。然而,基于信任间隔或接收器操作特性的传统方法缺乏明确扩展至两个以上等级的设置。我们通过根据概率模型和最佳决策理论制定多级分类来解决这一问题,以尽量减少假分类率的共性组合;当每个类别人口相对比例或普遍流行程度未知时,分类过程具有挑战性。因此,我们还开发了一种方法来估计测试数据的普遍普及程度,这种方法与分类无关。我们验证了我们对严重急性呼吸系统综合症冠状病毒2(SASS-COV-2)(XASS-COV-2)(XANS-COV-2)(NAV))和接种疫苗的血清数据所采用的方法。合成数据用来证明:(一) 流行性估计是不偏不倚的,与真实价值一致的;(二) 我们的程序适用于任意的测量层面。与二元问题相反,多级设置提供了广泛的效用,作为最普遍的框架,并对流行性估计最佳做法提供了新的洞察。