COVID-19 has a spectrum of disease severity, ranging from asymptomatic to requiring hospitalization. Providing appropriate medical care to severe patients is crucial to reduce mortality risks. Hence, in classifying patients into severity categories, the more important classification errors are "under-diagnosis", in which patients are misclassified into less severe categories and thus receive insufficient medical care. The Neyman-Pearson (NP) classification paradigm has been developed to prioritize the designated type of error. However, current NP procedures are either for binary classification or do not provide high probability controls on the prioritized errors in multi-class classification. Here, we propose a hierarchical NP (H-NP) framework and an umbrella algorithm that generally adapts to popular classification methods and controls the under-diagnosis errors with high probability. On an integrated collection of single-cell RNA-seq (scRNA-seq) datasets for 740 patients, we explore ways of featurization and demonstrate the efficacy of the H-NP algorithm in controlling the under-diagnosis errors regardless of featurization. Beyond COVID-19 severity classification, the H-NP algorithm generally applies to multi-class classification problems, where classes have a priority order.
翻译:COVID-19有一系列疾病严重程度,从无症状到需要住院不等。向重病患者提供适当的医疗护理对于降低死亡率风险至关重要。因此,在将患者分类为重度类别时,更重要的分类错误是“诊断不足”错误,即病人被误分类为不太严重类别,因而得不到足够的医疗护理。Neyman-Pearson(NP)分类模式已经制定,以优先处理指定类型的错误。然而,目前的NP程序要么是二进制分类,要么没有为多级分类中的优先错误提供高概率控制。在这里,我们提议一个等级(H-NP)框架和一个伞式算法,通常适应流行分类方法并控制诊断不足的错误。在综合收集740名病人的单细胞RNA-seq(scRNA-seq)数据集方面,我们探索如何使H-NP的算法在控制诊断不足的错误方面产生效力,而不管其是否成熟。除了COVID-19级的等级之外,H-NP的等级等级还存在多级的等级问题。