Single-cell reference atlases are large-scale, cell-level maps that capture cellular heterogeneity within an organ using single cell genomics. Given their size and cellular diversity, these atlases serve as high-quality training data for the transfer of cell type labels to new datasets. Such label transfer, however, must be robust to domain shifts in gene expression due to measurement technique, lab specifics and more general batch effects. This requires methods that provide uncertainty estimates on the cell type predictions to ensure correct interpretation. Here, for the first time, we introduce uncertainty quantification methods for cell type classification on single-cell reference atlases. We benchmark four model classes and show that currently used models lack calibration, robustness, and actionable uncertainty scores. Furthermore, we demonstrate how models that quantify uncertainty are better suited to detect unseen cell types in the setting of atlas-level cell type transfer.
翻译:单细胞参考地图册是大型的细胞级地图册,用单细胞基因组来记录使用单细胞基因组的器官中的细胞异质。鉴于这些地图册的大小和细胞多样性,这些地图册是将细胞型标签转换到新数据集的高质量培训数据。然而,这种标签图册的转移必须有力,以适应测量技术、实验室特性和更为普通的批量效应导致的基因表达方式的域性变化。这要求用各种方法对细胞型预测提供不确定性估计,以确保正确解释。在这里,我们首次对单细胞参考图册中的细胞型分类采用了不确定性量化方法。我们为四个模型类别设定基准,并表明目前使用的模型缺乏校准、稳健性和可操作的不确定性分数。此外,我们演示了如何更适合量化不确定性的模型,以在图集级细胞型转移时探测看不见的细胞类型。