Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation.
翻译:尽管近年来对行动的认识取得了令人印象深刻的成果,但视频培训数据的收集和说明仍然耗费时间和费用高昂,因此,提议对图像进行视频改造,以利用无标签的网络图像源,在未贴标签的目标视频上进行调整,这带来了两大挑战:(1) 网络图像和视频框架之间的空间域变化;(2) 图像和视频数据之间的模式差距;为应对这些挑战,我们建议采用循环域适应(CycDA),即以循环为基础的周期方法,通过利用图像和视频中的联合空间信息,对图像和视频进行不受监督的视频域适应,另一方面,对独立空间时空模型进行培训,以弥合模式上的差距;我们在空间和空间时空学习之间互换,在每个周期中进行知识转让;我们评价我们为图像到视频的基准数据集以及混合源域适应实现最新成果和展示我们自行车适应的好处,以此应对图像到视频之间的模式。