The study of immune cellular composition is of great scientific interest in immunology and multiple large-scale data have also been generated recently to support this investigation. From the statistical point of view, such immune cellular composition data corresponds to compositional data that conveys relative information. In compositional data, each element is positive and all the elements together sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations among elements in the compositional data. As this type of data has become more widely available, investigation of optimal statistical strategies considering compositional features in data became more in great need. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio and Dirichlet approaches, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
翻译:免疫细胞组成研究对免疫学具有极大的科学意义,最近也产生了许多大规模数据来支持这一调查。从统计的角度来看,这种免疫细胞组成数据与传递相对信息的构成数据相对应。在组成数据中,每个要素都是正的,所有元素合在一起等于一个常数,一般可将其定为一个常数。标准统计方法不能直接适用于组成数据分析,因为它们不适当地处理组成数据各要素之间的相互关系。随着这类数据日益广泛提供,对考虑到数据组成特征的最佳统计战略的调查变得非常需要。在本文件中,我们审查组成数据分析的统计方法,并在免疫学方面加以说明。具体地说,我们侧重于利用日志和狄里赫莱特方法进行回归分析,讨论其理论基础,并用从染色癌病人产生的免疫细胞分数数据说明其应用情况。