统一域自适应语义分割 (Unified Domain Adaptive Semantic Segmentation)

from arxiv, 34 pages (main paper and supplementary material), 25 figures, 19 tables. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

Unsupervised Domain Adaptive Semantic Segmentation (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain. The majority of existing UDA-SS works typically consider images whilst recent attempts have extended further to tackle videos by modeling the temporal dimension. Although the two lines of research share the major challenges -- overcoming the underlying domain distribution shift, their studies are largely independent, resulting in fragmented insights, a lack of holistic understanding, and missed opportunities for cross-pollination of ideas. This fragmentation prevents the unification of methods, leading to redundant efforts and suboptimal knowledge transfer across image and video domains. Under this observation, we advocate unifying the study of UDA-SS across video and image scenarios, enabling a more comprehensive understanding, synergistic advancements, and efficient knowledge sharing. To that end, we explore the unified UDA-SS from a general data augmentation perspective, serving as a unifying conceptual framework, enabling improved generalization, and potential for cross-pollination of ideas, ultimately contributing to the overall progress and practical impact of this field of research. Specifically, we propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies through four-directional paths for intra- and inter-domain mixing in a feature space. To deal with temporal shifts with videos, we incorporate optical flow-guided feature aggregation across spatial and temporal dimensions for fine-grained domain alignment. Extensive experiments show that our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks. Our source code and models will be released at https://github.com/ZHE-SAPI/UDASS.

翻译：无监督域自适应语义分割旨在将标注源域的监督信息迁移至无标注目标域。现有研究主要集中于图像数据，近期尝试通过建模时序维度进一步扩展至视频处理。尽管这两类研究面临共同的核心挑战——克服潜在的域分布偏移，其研究路径却基本相互独立，导致认知碎片化、缺乏整体性理解，且错失了思想交叉融合的机遇。这种割裂状态阻碍了方法的统一，造成图像与视频域间冗余的研究投入与次优的知识迁移。基于此观察，我们主张统一图像与视频场景下的无监督域自适应语义分割研究，以实现更全面的理解、协同性进展及高效的知识共享。为此，我们从广义数据增强的视角探索统一的无监督域自适应语义分割，将其构建为统一的概念框架，以提升泛化能力并促进思想交叉融合，最终推动该研究领域的整体进展与实践影响。具体而言，我们提出一种四向混合方法，其特点在于通过特征空间内四个方向的域内与跨域混合路径，处理不同的点属性与特征不一致性问题。针对视频的时序偏移，我们引入光流引导的跨时空维度特征聚合机制以实现细粒度域对齐。大量实验表明，我们的方法在四个具有挑战性的无监督域自适应语义分割基准测试中以显著优势超越现有最优方法。源代码与模型将在 https://github.com/ZHE-SAPI/UDASS 发布。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日