Test-time adaptation (TTA) enables models to adapt to distribution shifts at inference time. While entropy minimization over the output distribution has proven effective for TTA, transformers offer an additional unsupervised learning signal through their attention mechanisms. We propose minimizing the entropy of attention distributions from the CLS token to image patches as a novel TTA objective. This approach encourages the model to attend more confidently to relevant image regions under distribution shift and is effective even when only a single test image is available. We demonstrate that attention entropy minimization improves robustness across diverse corruption types while not hurting performance on clean data on a single sample stream of images at test time.
翻译:测试时自适应(TTA)使模型能够在推理阶段适应分布偏移。尽管在输出分布上最小化熵已被证明对TTA有效,但Transformer模型通过其注意力机制提供了额外的无监督学习信号。我们提出一种新颖的TTA目标:最小化从CLS标记到图像块的注意力分布熵。该方法鼓励模型在分布偏移下更自信地关注相关图像区域,即使在仅提供单张测试图像时依然有效。我们证明,在测试时单样本图像流中,注意力熵最小化能提升模型对多种损坏类型的鲁棒性,同时不损害其在干净数据上的性能。