We present SSOD, the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection. We use collections of real world images without bounding box annotations to learn to synthesize and detect objects. We leverage controllable GANs to synthesize images with pre-defined object properties and use them to train object detectors. We propose a tight end-to-end coupling of the synthesis and detection networks to optimally train our system. Finally, we also propose a method to optimally adapt SSOD to an intended target data without requiring labels for it. For the task of car detection, on the challenging KITTI and Cityscapes datasets, we show that SSOD outperforms the prior state-of-the-art purely image-based self-supervised object detection method Wetectron. Even without requiring any 3D CAD assets, it also surpasses the state-of-the-art rendering based method Meta-Sim2. Our work advances the field of self-supervised object detection by introducing a successful new paradigm of using controllable GAN-based image synthesis for it and by significantly improving the baseline accuracy of the task. We open-source our code at https://github.com/NVlabs/SSOD.
翻译:我们提出了第一届裁军特别联大,这是第一个端到端分析综合框架,其中含有可控的GANs,用于自我监督的物体探测任务;我们利用收集真实世界图像,而不附带框注解,学习合成和探测物体;我们利用可控的GANs,将图像与预先定义的物体属性合成,并用来培训物体探测器;我们提议对综合和探测网络进行严格的端到端的连接,以最佳地培训我们的系统;最后,我们还提议了一种方法,使裁军特别联大在不需要标签的情况下对预定的目标数据进行最佳的调整。为了进行汽车探测,我们在具有挑战性的KITTI和城市景景数据集方面,利用具有挑战性的KITTI和城市景色数据集,我们展示了裁军特别联大超越了先前状态的纯基于图像的物体探测方法Wtectron。即使不要求任何3D CAD资产,它也超过了基于状态-艺术的设定方法Meta-Sim2。我们的工作通过引入一个成功的新模式,即使用可控的GAN-VI/IMIL的图像校准度,大大改进了我们GAN/GAN的基线的图像的合成。