Vision Transformer (ViT) is becoming more popular in image processing. Specifically, we investigate the effectiveness of test-time adaptation (TTA) on ViT, a technique that has emerged to correct its prediction during test-time by itself. First, we benchmark various test-time adaptation approaches on ViT-B16 and ViT-L16. It is shown that the TTA is effective on ViT and the prior-convention (sensibly selecting modulation parameters) is not necessary when using proper loss function. Based on the observation, we propose a new test-time adaptation method called class-conditional feature alignment (CFA), which minimizes both the class-conditional distribution differences and the whole distribution differences of the hidden representation between the source and target in an online manner. Experiments of image classification tasks on common corruption (CIFAR-10-C, CIFAR-100-C, and ImageNet-C) and domain adaptation (digits datasets and ImageNet-Sketch) show that CFA stably outperforms the existing baselines on various datasets. We also verify that CFA is model agnostic by experimenting on ResNet, MLP-Mixer, and several ViT variants (ViT-AugReg, DeiT, and BeiT). Using BeiT backbone, CFA achieves 19.8% top-1 error rate on ImageNet-C, outperforming the existing test-time adaptation baseline 44.0%. This is a state-of-the-art result among TTA methods that do not need to alter training phase.
翻译:在图像处理中,视觉变异器(Vit)越来越受欢迎。 具体地说, 我们调查Vit上测试- 时间调整( TTA) 的功效, 测试- 时间调整( TTA) 的功效, 这个技术是用来在测试时自行校正预测的。 首先, 我们将各种测试- 时间适应方法的基准基准点以Vit- B16 和 Vit- L16 为基准。 显示TTTA对VT 和先前的发明( 明智地选择调制参数) 是不必要的。 根据观察, 我们提出了一种新的测试- 时间适应方法, 称为级- 条件特性调和( CFA) 的测试- 测试方法, 将等级- 条件分配差异和源和目标之间隐藏的分布差异最小化。 对普通腐败( CIFAR- 10- C、 CIFAR- 100- C 和图像网络- C) 的图像调控任务实验( 数字数据集和图像- 网络- Sketch) 显示, CAFAFA- 级- 现有基准级调控( T) 的基线- 标准- Rest- train- train- train- train- train- train- train- disal- dismal- dismal- dismal- ex- ex- ex- imal- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- legyal- beal- ex- beal- beal- ex- ex- ex- beal- ex- laut- laut- laut- beal- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- ex- laction- labal- laction- ex- lab- lection- lection- ex- ex- ex- ex- ex- ex- ex- ex- ex- labal- la- leg- leg- la-