Deep learning systems have been reported to acheive state-of-the-art performances in many applications, and one of the keys for achieving this is the existence of well trained classifiers on benchmark datasets which can be used as backbone feature extractors in downstream tasks. As a main-stream loss function for training deep neural network (DNN) classifiers, the cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior when no other techniques are used for alleviating it such as data augmentation. In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution of the dataset, i.e., the information or uncertainty remained in the labels after revealing the input. In this paper, we propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input. Theoretically, we give the population error probability lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mbR^n$, and also the error probability lower bound in this scenario. Besides, we establish the sample complexity for accurately learning the mutual information from empirical data samples drawn from the underlying data distribution. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. Without whistles and bells, the proposed mutual information learned classifiers (MILCs) acheive far better generalization performances than the state-of-the-art classifiers with an improvement which can exceed more than 10\% in testing accuracy.
翻译:深层学习系统被报告为许多应用中最先进的表现,而实现这一点的关键之一是在基准数据集上存在训练有素的分类师,这些分类师可以用作下游任务的主干特征提取器。作为培训深神经网络分类员的主要流损函数,交叉星载损失很容易导致我们找到模型,这些模型显示在不使用其他技术(如数据增强)的情况下,严重超配行为。在本文中,我们证明,培训 DNNN 分类员的现有交叉星流损失最小化基本上可以学习数据集基础数据分配的有条件的精密编码,即,在披露输入后,信息或不确定性仍留在标签中。在本文中,我们提出一个共同信息学习框架,通过学习标签与输入之间的相互信息。理论上,我们从提议的相互信息支持中给出了10种错误的概率。此外,我们从 $\\ mRQ 分类分类的混精度分析模型中,我们从一个不精确的基数级化的基数级数据模型中得出了一个更精确的精确的精确的精度。