Domain shift has always been one of the primary issues in video object segmentation (VOS), for which models suffer from degeneration when tested on unfamiliar datasets. Recently, many online methods have emerged to narrow the performance gap between training data (source domain) and test data (target domain) by fine-tuning on annotations of test data which are usually in shortage. In this paper, we propose a novel method to tackle domain shift by first introducing adversarial domain adaptation to the VOS task, with supervised training on the source domain and unsupervised training on the target domain. By fusing appearance and motion features with a convolution layer, and by adding supervision onto the motion branch, our model achieves state-of-the-art performance on DAVIS2016 with 82.6% mean IoU score after supervised training. Meanwhile, our adversarial domain adaptation strategy significantly raises the performance of the trained model when applied on FBMS59 and Youtube-Object, without exploiting extra annotations.
翻译:域变始终是视频对象分割(VOS)中的主要问题之一,在对不熟悉的数据集进行测试时模型会退化。最近,许多在线方法通过微调通常短缺的测试数据说明,缩小了培训数据(源域)和测试数据(目标域)之间的性能差距。在本文件中,我们提出了一种新的方法,通过首先对VOS任务引入对抗性域调整,在源域进行监管培训,并在目标域进行不受监督的培训,解决域变换问题。通过将外观和运动功能与组合层相结合,并通过在运动分支上增加监管,我们的模型在DAVIS2016上取得了最新业绩,在接受监督后达到82.6%的IoU分。与此同时,我们的对抗性域调整战略在应用FBMS59和Youtube-Object时,在不利用额外说明的情况下,大大提高了经过培训的模型的性能。