In the era of single-cell sequencing, there is a growing need to extract insights from data with clustering methods. Here, we introduce Forest Fire Clustering, an efficient and interpretable method for cell-type discovery from single-cell data. Forest Fire Clustering makes minimal prior assumptions and, different from current approaches, calculates a non-parametric posterior probability that each cell is assigned a cell-type label. These posterior distributions allow for the evaluation of a label confidence for each cell and enable the computation of "label entropies," highlighting transitions along developmental trajectories. Furthermore, we show that Forest Fire Clustering can make robust, inductive inferences in an online-learning context and can readily scale to millions of cells. Finally, we demonstrate that our method outperforms state-of-the-art clustering approaches on diverse benchmarks of simulated and experimental data. Overall, Forest Fire Clustering is a useful tool for rare cell type discovery in large-scale single-cell analysis.
翻译:在单细胞测序的时代,越来越需要利用集束方法从数据中提取洞察力。在这里,我们引入了森林火灾集束,这是从单细胞数据中发现细胞型的高效和可解释的方法。森林火灾集束采用最低的先前假设,并且不同于目前的方法,计算出每个细胞被分配到一个细胞型标签的非参数外缘概率。这些后表分布使得可以评估每个细胞的标签信任度,并能够计算“标签寄生虫”,突出发展轨迹的转变。此外,我们还表明森林火灾集束可以在在线学习环境中作出有力、诱导性的推论,并且可以很容易地向数百万个细胞推广。最后,我们证明我们的方法在模拟和实验数据的不同基准上,超越了最新的最新集束方法。总体而言,森林火灾集束是大型单细胞分析中稀有细胞型发现的一个有用工具。