Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore-Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
翻译:暂无翻译