Principal components analysis has been used to reduce the dimensionality of datasets for a long time. In this paper, we will demonstrate that in mode detection the components of smallest variance, the pettiest components, are more important. We prove that for a multivariate normal or Laplace distribution, we obtain boxes of optimal volume by implementing "pettiest component analysis", in the sense that their volume is minimal over all possible boxes with the same number of dimensions and fixed probability. This reduction in volume produces an information gain that is measured using active information. We illustrate our results with a simulation and a search for modal patterns of digitized images of hand-written numbers using the famous MNIST database; in both cases pettiest components work better than their competitors. In fact, we show that modes obtained with pettiest components generate better written digits for MNIST than principal components.
翻译:长期以来,主要元件分析被用于降低数据集的维度。 在本文中, 我们将证明在模式检测中, 最小差异的元件、 宠物元件更为重要。 我们证明, 对于多变量的正常或拉普尔分布, 我们通过实施“ 最小元件分析 ” 获得最佳体积的框, 也就是说, 其体积对于所有可能具有相同尺寸和固定概率的框体来说是最小的。 数量减少会带来信息收益, 用主动信息来衡量。 我们用著名的MNIST数据库模拟和搜索手写数字数字图的模型模式来说明我们的结果; 在这两种情况下, 毛件组件比其竞争者工作得更好。 事实上, 我们显示, 与宠物组件获得的模式为MNIST生成的文字数字比主元更优。