During the training process, deep neural networks implicitly learn to represent the input data samples through a hierarchy of features, where the size of the hierarchy is determined by the number of layers. In this paper, we focus on enforcing the discriminative power of the high-level representations, that are typically learned by the deeper layers (closer to the output). To this end, we introduce a new loss term inspired by the Gini impurity, which is aimed at minimizing the entropy (increasing the discriminative power) of individual high-level features with respect to the class labels. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. As such, we introduce another loss term to minimize the Kullback-Leibler divergence between the two distributions. We conduct experiments on two image classification data sets (CIFAR-100 and Caltech 101), considering multiple neural architectures ranging from convolutional networks (ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone.
翻译:在培训过程中,深心神经网络暗含地义地通过一个等级特征来代表输入的数据样本,其中等级的大小由层数决定。在本文中,我们侧重于执行高层代表的歧视性力量,通常由更深层(更接近产出)学习。为此,我们引入了一个新的损失术语,受基尼杂质启发,目的是最大限度地减少与等级标签有关的单个高层次特征的酶性(增加歧视性力量)。虽然我们的基尼损失引起高度差异性特征,但并不能确保高层次特征的分布与等级分布相匹配。因此,我们引入另一个损失术语,以尽量减少两种分布之间的Kullback-Leableer差异。我们实验了两套图像分类数据集(CIFAR-100和Caltech 101),目的是考虑到从同级网络(ResNet-17、ResNet-18、ResNet-50)到变异体(CvT)等多种神经结构。我们的经验显示,我们经过培训的新的损失术语将单独纳入目标培训的模型。