While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we evaluate two baseline ResNet models, two robust ResNet-50 models, and a robust vision transformer model, and we demonstrate that this approach achieves accuracy gains of 1-8\% over standard model evaluation and also generally outperforms prior augmentation and adaptation strategies. For the setting in which only one test point is available, we achieve state-of-the-art results on the ImageNet-C, ImageNet-R, and, among ResNet-50 models, ImageNet-A distribution shift benchmarks.
翻译:虽然深心神经网络可以在分布测试点上取得准确性,但许多应用都需要稳健性,即使在投入、域的变化或其他分布变化源出现意外干扰的情况下也是如此。我们研究了测试时间稳健性的问题,即利用测试投入提高模型的稳健性。最近的工作提出了测试时间适应性的方法,但是,它们都提出了额外的假设,例如进入多个测试点,从而阻止广泛采用。在这项工作中,我们的目标是研究和设计一些方法,这些方法不对模型培训进程做出任何假设,而且在测试时间广泛适用。我们提出了一种简单的方法,可用于模型具有概率和适应性的任何测试设置。我们用一个测试示例来研究测试时间稳健性问题,即在数据点上进行不同的数据增强,然后通过最大限度地减少模型平均值的灵敏度或边缘值的输出分布来调整模型参数。这个目标只鼓励模型在不同的状态增强中作出同样的预测,从而在测试点上执行对模型的逆差值的模型,在这些增强值中,我们用两个测试点来进行精确度的模型,同时在测试点上显示我们之前的预测结果。