High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often it is of interest to quantify and summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using univariate tests to measure gene-wise shifts in expression. However, these methods largely ignore changes in transcriptional correlation. Furthermore, there is a need to identify the low-dimensional structure of the gene expression shift to identify collections of genes that change between conditions. Here, we propose contrastive latent variable models designed for count data to create a richer portrait of differential expression in sequencing data. These models disentangle the sources of transcriptional variation in different conditions, in the context of an explicit model of variation at baseline. Moreover, we develop a model-based hypothesis testing framework that can test for global and gene subset-specific changes in expression. We test our model through extensive simulations and analyses with count-based gene expression data from perturbation and observational sequencing experiments. We find that our methods can effectively summarize and quantify complex transcriptional changes in case-control experimental sequencing data.
翻译:高通量 RNA 序列( RNA-seq) 技术是了解细胞状态的有力工具。 通常,人们有兴趣量化和总结实验或生物条件之间细胞状态的变化。 通常使用单体测试来评估差异表达式,以测量表达式的基因变化。 但是,这些方法在很大程度上忽略了笔录相关性的变化。 此外,需要确定基因表达方式转变的低维结构,以识别不同条件的基因集合。在这里,我们提出对比性潜伏变量模型,设计这些模型是为了计算数据,以产生更丰富的测序数据差异表达式。这些模型将不同条件下的笔录变异源分解开来,在基线的明显变异模型中。 此外,我们开发了一个基于模型的假设测试框架,可以测试表达方式上的全球和基因子子变化。我们通过广泛的模拟和分析来测试我们的模型,从扰动和观察测序实验中以计数为基础的基因表达式数据。我们发现,我们的方法可以有效地总结和量化案件控制实验测序数据中复杂的笔录变化。