A data table which is arranged according to two factors can often be considered as a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information would consist of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with a direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (PCA) is performed for dimension reduction, allowing to investigate the relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply robust PCA, which would otherwise suffer from the singularity of clr coefficients.
翻译:根据两个因素排列的数据表格往往可以被视为构成表格。例如失业人数,失业人数按性别和年龄类别分列。以组成方式分析,相关信息将由该表格不同单元格之间的比例组成。在联合分析若干组成表格时,这一点特别有用,因为绝对数字分布在非常不同的范围,例如,如果从不同国家考虑失业数据,则根据对数方法的框架,可以将组成表格分解成独立和互动的部分,为这些部分分配正态坐标。不过,这些坐标通常需要事先对数据有所了解,而且对于探讨给定因素之间的关系不易处理。这里我们提议特别选择与中心对数系数直接有关的坐标,这对于解释表格原始单元格特别有用。有了这些坐标,可以对尺寸缩小进行稳健的主要组成部分分析,从而可以调查各种因素之间的关系。这些坐标之间的正态坐标和千兆克系数之间的关联,可以从其他角度影响稳健的正数。