Applications of single-cell RNA sequencing in various biomedical research areas have been blooming. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including large dimensionality, high dropout rates, and possibly batch effects, bring great difficulty into the analysis of such data. Not appropriately addressing these issues obstructs true scientific discovery. Herein, we propose a unified Regularized Zero-inflated Mixture Model framework designed for scRNA-seq data (RZiMM-scRNA) to simultaneously detect cell subgroups and identify gene differential expression based on a developed importance score, accounting for both dropouts and batch effects. We conduct extensive simulation studies in which we evaluate the performance of RZiMM-scRNA and compare it with several popular methods, including Seurat, SC3, K-Means, and Hierarchical Clustering. Simulation results show that RZiMM-scRNA demonstrates superior clustering performance and enhanced biomarker detection accuracy compared to alternative methods, especially when cell subgroups are less distinct, verifying the robustness of our method. Our empirical investigations focus on two brain tumor studies dealing with astrocytoma of various grades, including the most malignant of all brain tumors, glioblastoma multiforme (GBM). Our goal is to delineate cell heterogeneity and identify driving biomarkers associated with these tumors. Notably, RZiMM-scNRA successfully identifies a small group of oligodendrocyte cells which has drawn much attention in biomedical literature on brain cancers.
翻译:在各种生物医学研究领域,单细胞RNA测序的应用一直在蓬勃发展。这种新技术为研究细胞一级的疾病异质性提供了前所未有的机会。然而,ScRNA类数据的独特性,包括大尺寸、高辍学率和可能的批量效应,给分析这类数据带来了极大的困难。没有适当地解决这些问题妨碍了真正的科学发现。在这里,我们提议为 scRNA-seq 数据设计一个统一的常规零膨胀混合模型框架(RZimM-scRNA),以同时检测细胞分组,并查明基于成熟重要性分的基因差异表达方式,同时核算辍学和批量效应。我们进行了广泛的模拟研究,评估RZIMM-scRNA的性能,并将这些数据与包括Seurat、SC3、K-Musastrus和Hirarchic Croupation等在内的一些流行方法进行比较。模拟结果表明,RiMM-c-cRONA与相关方法相比,特别是在细胞类分数分数分数分数的细胞分数研究中,我们对内层的大脑类的精度进行了最强度研究。