Convolutional neural networks have shown remarkable ability to learn discriminative semantic features in image recognition tasks. Though, for classification they often concentrate on specific regions in images. This work proposes a novel method that combines variant rich base models to concentrate on different important image regions for classification. A feature distance loss is implemented while training an ensemble of base models to force them to learn discriminative feature concepts. The experiments on benchmark convolutional neural networks (VGG16, ResNet, AlexNet), popular datasets (Cifar10, Cifar100, miniImageNet, NEU, BSD, TEX), and different training samples (3, 5, 10, 20, 50, 100 per class) show our methods effectiveness and generalization ability. Our method outperforms ensemble versions of the base models without feature distance loss, and the Class Activation Maps explicitly proves the ability to learn different discriminative feature concepts.
翻译:进化神经网络在图像识别任务中学习歧视性语义特征的能力是惊人的。虽然对于分类来说,它们往往集中在图像中的具体区域。这项工作提出了一个创新方法,将丰富多样的基础模型结合起来,集中在不同的重要图像区域进行分类。在对一组基础模型进行培训以迫使它们学习歧视性特征概念的同时,还实施了地貌距离损失。关于基准进化神经网络(VGG16、ResNet、AlexNet)、流行数据集(Cifar10、Cifar100、MiniImageNet、NEU、BSD、TEX)和不同培训样本(3、5、10、20、50、100)的实验,显示了我们的方法的有效性和一般化能力。我们的方法超越了基础模型的共性版本,没有地貌距离损失,而分类激活图明确证明了学习不同歧视特征概念的能力。