Single-Cell RNA sequencing (scRNA-seq) measurements have facilitated genome-scale transcriptomic profiling of individual cells, with the hope of deconvolving cellular dynamic changes in corresponding cell sub-populations to better understand molecular mechanisms of different development processes. Several scRNA-seq analysis methods have been proposed to first identify cell sub-populations by clustering and then separately perform differential expression analysis to understand gene expression changes. Their corresponding statistical models and inference algorithms are often designed disjointly. We develop a new method -- SimCD -- that explicitly models cell heterogeneity and dynamic differential changes in one unified hierarchical gamma-negative binomial (hGNB) model, allowing simultaneous cell clustering and differential expression analysis for scRNA-seq data. Our method naturally defines cell heterogeneity by dynamic expression changes, which is expected to help achieve better performances on the two tasks compared to the existing methods that perform them separately. In addition, SimCD better models dropout (zero inflation) in scRNA-seq data by both cell- and gene-level factors and obviates the need for sophisticated pre-processing steps such as normalization, thanks to the direct modeling of scRNA-seq count data by the rigorous hGNB model with an efficient Gibbs sampling inference algorithm. Extensive comparisons with the state-of-the-art methods on both simulated and real-world scRNA-seq count data demonstrate the capability of SimCD to discover cell clusters and capture dynamic expression changes. Furthermore, SimCD helps identify several known genes affected by food deprivation in hypothalamic neuron cell subtypes as well as some new potential markers, suggesting the capability of SimCD for bio-marker discovery.
翻译:单细胞 RNA 序列( scRNA- seq) 测量有助于对单细胞进行基因组规模的神经定序图谱分析,希望将相应的细胞亚组群的细胞动态变化与细胞动态变化脱钩,以更好地了解不同发展进程的分子机制。一些 scRNA - seq 分析方法已经提出,首先通过组合来识别细胞子群,然后分别进行不同的表达分析,以了解基因表达方式的变化。它们相应的统计模型和推论算法往往设计不连贯。我们开发了一种新的方法 -- -- SimCD -- -- 明确模拟细胞-CD 细胞- CD 异质性和动态差异变化,在统一的等级伽马内内分母(hGNB) 模型中,将细胞同步组合和表达分析纳入不同的分子机制,以便更好地了解细胞子组群群群群的细胞异性。此外,SimCD 更好的模型( 零通胀) 将ScRNA 类比值数据化( slimality) 表示出一些细胞- sqequlity 数据, 通过细胞和基因级等模型级的模型级的精度比较, 显示精度的精度的精度的精度的精度的精度的精度的精度,并消除的精度的精度- 的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度,以及精确度的精度的精度的精度的精度的精度的精度的精度,表明的精度的精度的精度的精度。