Correspondence analysis is a dimension reduction method for visualization of nonnegative data sets, in particular contingency tables ; but it depends on the marginals of the data set. Two transformations of the data have been proposed to render correspondence analysis row and column scales invariant : These two kinds of transformations change the initial form of the data set into a bistochastic form. The power transorfmation applied by Greenacre (2010) has one positive parameter. While the transormation applied by Mosteller (1968) and Goodman (1996) has (I+J) positive parameters, where the raw data is row and column scaled by the Sinkhorn (RAS or ipf) algorithm to render it bistochastic. Goodman (1996) named correspondence analsis of a bistochastic matrix marginal-free correspondence analysis. We discuss these two transformations, and further generalize Mosteller-Goodman approach.
翻译:对应分析是一种用于可视化非负数据集(特别是列联表)的降维方法;但其结果依赖于数据集的边缘分布。为使得对应分析对行与列尺度具有不变性,已有两种数据变换方法被提出:这两种变换均将原始数据集形式转变为双随机形式。Greenacre(2010)提出的幂变换含有一个正参数;而Mosteller(1968)与Goodman(1996)采用的变换则包含(I+J)个正参数——该方法通过Sinkhorn(RAS或ipf)算法对原始数据进行行列缩放以使其双随机化。Goodman(1996)将双随机矩阵的对应分析称为无边缘对应分析。本文系统讨论这两种变换方法,并进一步推广Mosteller-Goodman的研究路径。