While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we demonstrate that this approach consistently improves robust ResNet and vision transformer models, achieving accuracy gains of 1-8% over standard model evaluation and also generally outperforming prior augmentation and adaptation strategies. We achieve state-of-the-art results for test shifts caused by image corruptions (ImageNet-C), renditions of common objects (ImageNet-R), and, among ResNet-50 models, adversarially chosen natural examples (ImageNet-A).
翻译:虽然深心神经网络可以在分布测试点上达到准确度,但许多应用程序要求网络网络的稳健性,即便在投入、域的变化或其他分布变化源出现意外干扰的情况下也是如此。我们研究了测试时间稳健性的问题,即利用测试投入提高模型的稳健性。最近的工作提出了测试时间适应性的方法,但是,它们都提出了额外的假设,例如进入多个测试点,从而阻止广泛采用。在这项工作中,我们的目标是研究和设计一些方法,这些方法对模型培训进程不作任何假设,而且在测试时间广泛适用。我们提出了一种简单的方法,可用于模型具有概率性和适应性的任何测试设置。我们研究了时间稳健性的问题,即使用测试输入投入来提高模型的稳健性投入,然后通过最大限度地减少模型平均值的灵敏度或边缘性产出分布来修改模型参数。这个目标鼓励模型在不同增益中作出同样的预测,从而在测试时间上广泛适用。我们提出了一种简单的模型,这些模型的易变换模型,同时持续地展示了我们先前的测试结果。