Traditional deep learning algorithms often fail to generalize when they are tested outside of the domain of the training data. The issue can be mitigated by using unlabeled data from the target domain at training time, but because data distributions can change dynamically in real-life applications once a learned model is deployed, it is critical to create networks robust to unknown and unforeseen domain shifts. In this paper we focus on one of the reasons behind the inability of neural networks to be so: deep networks focus only on the most obvious, potentially spurious, clues to make their predictions and are blind to useful but slightly less efficient or more complex patterns. This behaviour has been identified and several methods partially addressed the issue. To investigate their effectiveness and limits, we first design a publicly available MNIST-based benchmark to precisely measure the ability of an algorithm to find the ''hidden'' patterns. Then, we evaluate state-of-the-art algorithms through our benchmark and show that the issue is largely unsolved. Finally, we propose a partially reversed contrastive loss to encourage intra-class diversity and find less strongly correlated patterns, whose efficiency is demonstrated by our experiments.
翻译:传统的深层次学习算法通常无法在培训数据领域外测试时加以概括。 这个问题可以通过在培训时间使用目标领域未贴标签的数据来缓解, 但是由于数据分布一旦运用了学习模型,就能够在实际应用中动态地改变数据分布,因此建立网络对于未知和意外的域变至关重要。 在本文中,我们集中关注神经网络无法做到这样的原因之一:深层次网络只关注最明显、潜在虚假的预测线索,对有用但效率略低或更复杂的模式视而不见。 这种行为已经确定,并且有好几种方法部分解决了这个问题。 为了调查其有效性和局限性,我们首先设计一个以公开提供的 MNIST 基准,以精确测量算法找到“ 隐藏” 模式的能力。 然后,我们通过我们的基准来评估最先进的算法,并表明这个问题基本上没有解决。 最后,我们建议进行部分反向式的对比损失,以鼓励类内多样性并找到不那么紧密关联的模式,其效率通过我们的实验来证明。