用于从图像中学习视频物体探测器的未经监督的 Aversarial Adversarial 视觉水平应用域域 (Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images)

Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.

翻译：深层学习天体探测器需要数千个多样化的捆绑框和分类附加说明的例子。虽然图像天体探测器近年来随着多个大型静态图像数据集的发布而显示快速进展, 但由于缺少附加注释的视频框, 视频上的天体探测仍然是一个未解决的问题。拥有一个强大的视频天体探测器是视频理解和在视频中整理大规模自动说明的一个基本组成部分。图像和视频之间的域差异使得图像天体探测器可以传输到视频亚最佳的子视频。最常见的解决办法是使用监管不力的图解, 视频框必须标记显示存在/ 缺少对象类别。这仍然会增加手工工作。在本文中,我们向前迈出一步, 调整未经监督的对立对立图像图像到图像图像图像的图像转换概念。假设存在一个完全附加说明的静态图像数据集和一个不附加说明的视频数据数据集。对象探测器将使用原始数据转换的原始数据翻译进行对立方图像转换。在您立比对磁盘的图像测试中, 将您立式地面探测器进行直接的测试, 将您比对常规的图像定位定位定位的图像测试, 将进行直向普通的图像定位定位的定位定位定位定位定位的定位定位的定位的测试, 将进行测试, 将您定位为直向普通的定位的定位的定位的定位的定位的定位的定位的定位的定位的图像的定位的定位的定位的定位的定位的定位的定位的定位的定位的定位为直向。