Video semantic segmentation is an essential task for the analysis and understanding of videos. Recent efforts largely focus on supervised video segmentation by learning from fully annotated data, but the learnt models often experience clear performance drop while applied to videos of a different domain. This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos. DA-VSN consists of two novel and complementary designs. The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning. The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames. Extensive experiments demonstrate the superiority of our proposed domain adaptive video segmentation network which outperforms multiple baselines consistently by large margins.
翻译:视频语义分解是分析和理解视频的重要任务。 近期的努力主要侧重于通过从充分附加说明的数据中学习,监督视频分解,但所学模型往往在应用到不同领域的视频时出现明显的性能下降。 本文介绍了DA- VSN,这是一个域性适应性视频分解网络,通过连续的目标域框时间一致性规范解决视频领域差距。 DA- VSN由两个新颖和互补的设计组成。 第一个是跨域的TR,它通过对抗性学习,指导目标框架的预测与源框(从附加说明的来源数据中取出)相似的时间一致性。 第二个是内域域内TR,它指导对目标框架的不自信预测,其时间一致性类似于对目标框的可靠预测。 广泛实验显示了我们拟议的区域适应性视频分解网络的优越性,它以大边缘一致的方式超越多个基线。