Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data. Experiments in unsupervised and semi-supervised domain adaptation settings prove the effectiveness of this novel domain adaptation strategy. Specifically, we evaluate on the task of 3D semantic segmentation using the image and point cloud modality. We leverage recent autonomous driving datasets to produce a wide variety of domain adaptation scenarios including changes in scene layout, lighting, sensor setup and weather, as well as the synthetic-to-real setup. Our method significantly improves over previous uni-modal adaptation baselines on all adaption scenarios. Code will be made available.
翻译:在标签稀缺的情况下, 域适应是一项重要的学习任务。 虽然大多数工作仅侧重于图像模式, 但有许多重要的多模式数据集。 为了利用多模式来进行域适应, 我们提议采用跨模式学习, 藉此加强两种模式的预测的一致性。 我们限制我们的网络, 以对标签数据作出正确预测, 并对未标签的目标域数据进行各种模式的一致预测。 在未经监督和半监督的域域适应设置中进行的实验证明了这个新颖域适应战略的有效性。 具体地说, 我们利用图像和点云模式来评估3D的语义分解任务。 我们利用最近的自主驱动数据集来产生广泛的域适应情景, 包括场景布局、 照明、 传感器设置和天气的变化, 以及合成到现实的设置。 我们的方法将大大改进先前所有适应情景的单式适应基线 。 代码将会被提供 。