主要组成部分分析的复杂程度 (The Stochastic Complexity of Principal Component Analysis)

PCA (principal component analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. For an arbitrary matrix, they cannot, on their own, determine the size of the reduced dimension, but rather must be given this as an input. NML (normalized maximum likelihood) is a universal implementation of the Minimal Description Length principle, which gives an objective compression-based criterion for model selection. This work applies NML to PCA. A direct attempt to do so would involve the distributions of singular values of random matrices, which is difficult. A reduction to linear regression with a noisy unitary covariate matrix, however, allows finding closed-form bounds on the NML of PCA.

翻译：五氯苯甲醚(主要成分分析)及其变体是用于减少基质维度和减少分解潜在因素提取的无处不在的技术。对于任意的基质,它们不能自行决定缩小的维度大小,但必须将其作为输入。NML(标准化最大可能性)是普遍实施最小描述长度原则,该原则为模型选择提供了一个客观压缩标准。这项工作将NML适用于五氯苯甲醚。直接这样做将涉及随机基质单值的分配,这是很难做到的。但是,如果以一个吵闹的单一共变式基质减少为线性回归,则可以在五氯苯甲醚NML上找到封闭式的界限。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日