我们是否需要直接进入源数据集,以便实现域的普及? (Do We Need to Directly Access the Source Datasets for Domain Generalization?)

Domain generalization (DG) aims to learn a generalizable model from multiple known source domains for unknown target domains. Tremendous data distributed across lots of places/devices nowadays that can not be directly accessed due to privacy protection, especially in some crucial areas like finance and medical care. However, most of the existing DG algorithms assume that all the source datasets are accessible and can be mixed for domain-invariant semantics extraction, which may fail in real-world applications. In this paper, we introduce a challenging setting of training a generalizable model by using distributed source datasets without directly accessing them. We propose a novel method for this setting, which first trains a model on each source dataset and then conduct data-free model fusion that fuses the trained models layer-by-layer based on their semantic similarities, which aggregates different levels of semantics from the distributed sources indirectly. The fused model is then transmitted and trained on each dataset, we further introduce cross-layer semantic calibration for domain-invariant semantics enhancement, which aligns feature maps between the fused model and a fixed local model with an attention mechanism. Extensive experiments on multiple DG datasets show the significant performance of our method in tackling this challenging setting, which is even on par or superior to the performance of the state-of-the-art DG approaches in the standard DG setting.

翻译：广域化( DG) 旨在从多个已知源域学习一个通用模型, 用于未知目标域的未知源域。巨大的数据分布于许多地方/设施,由于保护隐私, 特别是财务和医疗等关键领域的隐私, 无法直接访问这些数据。然而, 大部分现有的 DG 算法假设所有源数据集都可以访问, 并且可以间接地将所有源数据集混在一起, 用于域- 异性语义提取, 这在现实世界应用程序中可能失败。在本文中, 我们引入了一个具有挑战性的培训设置, 一个通用模型, 使用分布式源数据集, 而不直接访问它们。我们为这一设置提出了一个新颖的方法, 首先对每个源数据集进行模型的模型进行不直接访问, 然后进行无数据模式的整合, 根据经培训的模型的语义相似性能逐层组合, 将分布源源的语义提取不同程度的语义提取。然后, 在每套数据集中, 我们进一步引入跨层次的语义校校校校校, 用于域变式的语义加强。我们的语义化模型的地图图图图图, 将连接模型和高层次的本地模型将显示的立的立的立局的功能模型。