Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data. Experiments in unsupervised and semi-supervised domain adaptation settings prove the effectiveness of this novel domain adaptation strategy. Specifically, we evaluate on the task of 3D semantic segmentation from either the 2D image, the 3D point cloud or from both. We leverage recent driving datasets to produce a wide variety of domain adaptation scenarios including changes in scene layout, lighting, sensor setup and weather, as well as the synthetic-to-real setup. Our method significantly improves over previous uni-modal adaptation baselines on all adaption scenarios. Our code is publicly available at https://github.com/valeoai/xmuda_journal
翻译:在标签稀缺的情况下, 校内适应是一项重要的任务, 有助于学习标签缺乏时的学习。 虽然大多数工程只关注图像模式, 但有许多重要的多模式数据集。 为了利用多模式来进行域适应, 我们建议进行跨模式学习, 以便通过相互模拟, 使两种模式的预测保持一致。 我们限制我们的网络, 以对标签数据作出正确预测, 并对未标记的目标域数据进行各种模式的一致预测。 在未经监督和半监督的域域适应设置中的实验证明了这个新颖域适应战略的有效性。 具体地说, 我们从 2D 图像、 3D 点云或两者中评估 3D 语义分割的任务 。 我们利用最近的驱动数据集来产生范围广泛的域适应情景, 包括场景布局、 照明、 传感器设置和天气的变化, 以及合成到现实的设置。 我们的方法大大改进了所有适应情景的原单式适应基线 。 我们的代码在 https://github.com/ valeai/ xudus_journal 上公开提供。