ALOFT:一种具有动态低频变换的轻量级类MLP结构的域泛化方法 (ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization)

Domain generalization (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training. Most existing DG works are based on convolutional neural networks (CNNs). However, the local operation of the convolution kernel makes the model focus too much on local representations (e.g., texture), which inherently causes the model more prone to overfit to the source domains and hampers its generalization ability. Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most state-of-the-art CNN-based methods. The baseline can learn global structure representations with a filter to suppress structure irrelevant information in the frequency space. Moreover, we propose a dynAmic LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture features while preserving global structure features, thus enabling the filter to remove structure-irrelevant information sufficiently. Extensive experiments on four benchmarks have demonstrated that our method can achieve great performance improvement with a small number of parameters compared to SOTA CNN-based DG methods. Our code is available at https://github.com/lingeringlight/ALOFT/.

翻译：域泛化（DG）旨在通过多个源域而不需要重新训练学习一个能够泛化到未知目标域的模型。现有的大部分DG方法都基于卷积神经网络（CNN）。然而，卷积核的局部操作使得模型过度关注局部表示（例如纹理），这本质上使模型更容易过拟合到源域并阻碍了其泛化能力。最近，几种基于多层感知器（MLP）的方法通过学习图像不同部分之间的全局交互，在监督学习任务中取得了良好的结果。在这篇论文中，我们首先分析了CNN和MLP方法在DG中的差异，并发现MLP方法表现出更好的泛化能力，因为它们可以更好地捕捉到全局表示（例如结构）而不是CNN方法。然后，基于最近的轻量级MLP方法，我们获得了一种强劲的基础模型，胜过了大多数CNN-based DG方法。该基础模型可以学习使用滤波器抑制频率空间中与结构不相关的信息来捕获全局结构表示。此外，我们提出了一种名为ALOFT的动态低频光谱变换，可以扰动局部纹理特征同时保留全局结构特征，从而充分使滤波器去除结构不相关的信息。在四个基准测试中进行了大量实验，证明了我们的方法可以在参数数量较少的情况下与SOTA CNN-based DG方法相比，实现显著的性能改进。我们的代码可在 https://github.com/lingeringlight/ALOFT/ 上得到。