In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition. Leveraging multiple modalities has been proved to benefit the Unsupervised Domain Adaptation (UDA) task. In this work, we present Multi-Modal Mutual Enhancement Module (M3EM), a deep module for jointly considering information from multiple modalities to find the most transferable representations across domains. We achieve this by implementing two sub-modules for enhancing each modality using the context of other modalities. The first sub-module exchanges information across modalities through the semantic space, while the second sub-module finds the most transferable spatial region based on the consensus of all modalities.
翻译:在本报告中,我们描述了我们提交2021年EPIC-Kititchens-100号无人监督的行动识别领域适应挑战文件的技术细节。利用多种模式已证明有利于无人监督的域适应(UDA)任务。我们在此工作中介绍了多模式相互增强模块(M3EM),这是一个深入模块,用于共同审议多种模式的信息,以找到跨领域最可转移的表示方式。我们通过实施两个子模块,利用其他模式加强每种模式。第一个子模块通过语义空间交换各种模式的信息,而第二个小模块则根据所有模式的共识找到最可转移的空间区域。