相互信息在不同分类方面的作用 (The Role of Mutual Information in Variational Classifiers)

Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback-Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST and CIFAR datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error.

翻译：数据过于夸大是一种众所周知的现象,它与产生一种模型有关,这种模型过于密切(或确切地)地模仿一个特定的数据实例,因此可能无法可靠地预测未来观测结果。实际上,这种行为受各种有时的超常-常规化技术控制,这些技术的动机是发展对一般化错误的上界界限。在这项工作中,我们研究依赖在交叉性损失方面受过训练的随机编码的分类师的普遍化错误,这种编码常常用于深入学习分类问题。我们从广义错误中推断出这样的错误,即存在一种制度,即总化错误受输入特性和潜在空间的相应表述之间的相互信息约束,而根据编码分布随机生成的。我们的界限提供了对所谓变异分类分类类别中的一般化的信息理论性理解,这种分类以Kwack-Leiperer (KL) 差异化术语为常规化术语的常规化。这些结果为高度流行的 KL 变异化术语的理论依据,即我们已认识到的变异化方法的相互信息特性,我们确实以相互规则化的方式研究结果。