There is a growing interest in cell-type-specific analysis from bulk samples with a mixture of different cell types. A critical first step in such analyses is the accurate estimation of cell-type proportions in a bulk sample. Although many methods have been proposed recently, quantifying the uncertainties associated with the estimated cell-type proportions has not been well studied. Lack of consideration of these uncertainties can lead to missed or false findings in downstream analyses. In this article, we introduce a flexible statistical deconvolution framework that allows a general and subject-specific covariance of bulk gene expressions. Under this framework, we propose a decorrelated constrained least squares method called DECALS that estimates cell-type proportions as well as the sampling distribution of the estimates. Simulation studies demonstrate that DECALS can accurately quantify the uncertainties in the estimated proportions whereas other methods fail. Applying DECALS to analyze bulk gene expression data of post mortem brain samples from the ROSMAP and GTEx projects, we show that taking into account the uncertainties in the estimated cell-type proportions can lead to more accurate identifications of cell-type-specific differentially expressed genes and transcripts between different subject groups, such as between Alzheimer's disease patients and controls and between males and females.
翻译:与不同细胞类型混合的散装样本对特定细胞类型的分析越来越感兴趣,这种分析的关键第一步是准确估计散装样本中的细胞类型比例。虽然最近提出了许多方法,但没有很好地研究过与估计细胞类型比例有关的不确定性的量化问题,对这些不确定性的考虑不足可能导致下游分析中误判或误判。在本条中,我们引入了一个灵活的统计分流框架,允许对散装基因表达方式进行一般性和主题性的共变。在这个框架内,我们提出了一种与装饰有关的受限制的最低方块方法,即DECALS,该方法估计细胞类型比例和估计数的抽样分布。模拟研究表明,DECALS可以准确地量化估计比例的不确定性,而其他方法则失败。应用DECALS分析ROSMAP和GTEx项目中验尸后脑样本的成份基因表达数据,我们表明,考虑到估计的细胞类型比例的不确定性,我们可导致更准确地识别不同主题组之间具体表达的细胞类型基因和笔迹,例如阿尔茨海氏和女性之间的控制。