Deep neural networks often fail to generalize outside of their training distribution, in particular when only a single data domain is available during training. While test-time adaptation has yielded encouraging results in this setting, we argue that, to reach further improvements, these approaches should be combined with training procedure modifications aiming to learn a more diverse set of patterns. Indeed, test-time adaptation methods usually have to rely on a limited representation because of the shortcut learning phenomenon: only a subset of the available predictive patterns is learned with standard training. In this paper, we first show that the combined use of existing training-time strategies, and test-time batch normalization, a simple adaptation method, does not always improve upon the test-time adaptation alone on the PACS benchmark. Furthermore, experiments on Office-Home show that very few training-time methods improve upon standard training, with or without test-time batch normalization. We therefore propose a novel approach using a pair of classifiers and a shortcut patterns avoidance loss that mitigates the shortcut learning behavior by reducing the generalization ability of the secondary classifier, using the additional shortcut patterns avoidance loss that encourages the learning of samples specific patterns. The primary classifier is trained normally, resulting in the learning of both the natural and the more complex, less generalizable, features. Our experiments show that our method improves upon the state-of-the-art results on both benchmarks and benefits the most to test-time batch normalization.
翻译:深神经网络往往无法在培训分布之外推广,特别是在培训期间只有单一数据领域的情况下。测试时间适应办法虽然在这种环境下取得了令人鼓舞的结果,但我们争辩说,为了实现进一步的改进,这些办法应当与培训程序修改相结合,目的是学习一套更加多样化的模式。事实上,测试时间适应方法通常必须依赖有限的代表性,因为有快捷学习现象:只有可用预测模式的一部分通过标准培训学习。我们本文件首先显示,现有培训时间战略和测试时间正常化(一种简单的适应方法)的综合使用并非总能改进仅凭计算机辅助控制系统基准的测试时间适应办法。此外,对办公室的实验表明,只有极少数培训时间方法在标准培训中得到改进,无论有没有测试时间的批次正常化。因此,我们建议采用一种新颖的方法,使用一对分类师和快捷模式避免损失,通过降低二级分类员的普及学习能力,同时利用额外的快捷模式避免损失,鼓励学习具体模式,即简单化的测试方法,这种方法并不总是改进。在常规和常规测试中,我们最复杂的试验阶段的实验结果通常会降低。