Invariant coordinate selection (ICS) is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem (EVP) by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular and the problem cannot be solved. To address this limitation, three approaches are proposed based on: a Moore-Penrose pseudo inverse (GINV), a dimension reduction (DR), and a generalized singular value decomposition (GSVD). Their properties are investigated theoretically and in different empirical applications. Overall, the extension based on GSVD seems the most promising even if it restricts the choice of scatter matrices that can be expressed as cross-products. In practice, some of the approaches also look suitable in the context of data in high dimension low sample size (HDLSS).
翻译:暂无翻译