To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.
翻译:为了从大多数类别之间差别最大的高维数据中学习内在的低维结构,我们建议采用最大层编码率降低原则($\text{MCR ⁇ 2$),这是一个信息理论衡量标准,最大限度地扩大整个数据集和每个类别之和之间的编码率差异。我们澄清了它与大多数现有框架的关系,例如交叉渗透、信息瓶颈、信息获取、合同和对比学习,并为学习多样性和歧视性特征提供了理论保障。编码率可以从退化的次空间类似分布的有限样本中准确计算,并能够以统一的方式在受监督、自我监督和不受监督的环境中学习内在的表述。 偶然的是,仅使用这一原则的表述比使用交叉消耗特征的表述更能为分类中的腐败贴上标签,并能够导致最先进的将自取的变量混合数据组合在一起的结果。