PCA (principal component analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. For an arbitrary matrix, they cannot, on their own, determine the size of the reduced dimension, but rather must be given this as an input. NML (normalized maximum likelihood) is a universal implementation of the Minimal Description Length principle, which gives an objective compression-based criterion for model selection. This work applies NML to PCA. A direct attempt to do so would involve the distributions of singular values of random matrices, which is difficult. A reduction to linear regression with a noisy unitary covariate matrix, however, allows finding closed-form bounds on the NML of PCA.
翻译:五氯苯甲醚(主要成分分析)及其变体是用于减少基质维度和减少分解潜在因素提取的无处不在的技术。对于任意的基质,它们不能自行决定缩小的维度大小,但必须将其作为输入。NML(标准化最大可能性)是普遍实施最小描述长度原则,该原则为模型选择提供了一个客观压缩标准。这项工作将NML适用于五氯苯甲醚。直接这样做将涉及随机基质单值的分配,这是很难做到的。但是,如果以一个吵闹的单一共变式基质减少为线性回归,则可以在五氯苯甲醚NML上找到封闭式的界限。