The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the generalized linear model with Dirichlet distribution, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
翻译:免疫细胞组成研究对免疫学具有极大的科学意义,因为生成了多种大规模数据。从统计角度看,这种免疫细胞数据应该被视为构成数据。在组成数据中,每个要素都是正的,所有元素总和为常数,一般可定为常数。标准统计方法不能直接适用于组成数据分析,因为它们不适当地处理组成要素之间的相互关系。在本文件中,我们审查组成数据分析的统计方法,并在免疫学方面加以说明。具体地说,我们侧重于利用日志-鼠标变换法和具有 Dirichlet 分布的通用线性模型进行回归分析,讨论理论基础,并用从染色癌患者中生成的免疫细胞部分数据说明其应用。