Cell type deconvolution is a computational approach to infer proportions of individual cell types from bulk transcriptomics data. Though many new methods have been developed for cell type deconvolution, most of them only provide point estimation of the cell type proportions. On the other hand, estimates of the cell type proportions can be very noisy due to various sources of bias and randomness, and ignoring their uncertainty may greatly affect the validity of downstream analyses. In this paper, we propose a comprehensive statistical framework for cell type deconvolution and construct asymptotically valid confidence intervals both for each individual's cell type proportion and for quantifying how cell type proportions change across multiple bulk individuals in downstream regression analyses. Our analysis takes into account various factors including the biological randomness of gene expressions across cells and individuals, gene-gene dependence, and the cross-platform biases and sequencing errors, and avoids any parametric assumptions on the data distributions. We also provide identification conditions of the cell type proportions when there are arbitrary platforms-specific bias across sequencing technologies.
翻译:细胞类型变异是一种计算方法,用于从散装转录组数据中推算单个细胞类型的比例。虽然已经为细胞类型变异制定了许多新方法,但大多数方法只提供细胞类型比例的点估计。另一方面,由于偏差和随机性的各种来源,对细胞类型比例的估计可能非常吵闹,而忽视其不确定性则会大大影响下游分析的有效性。在本文件中,我们建议为细胞类型变异建立一个全面的统计框架,并构建一个不起作用的互信间隔,既针对每个个人的细胞类型比例,也针对下游回归分析中多个大块人的细胞类型比例的变化进行量化。我们的分析考虑到各种因素,包括跨细胞和个人的基因表达方式的生物随机性、基因基因基因依赖性、跨平台偏差和顺序误差,以及避免数据分布的任何参数假设。我们还提供了在测序技术中存在任意的平台偏差时的细胞类型比例的识别条件。