It is challenging to train a robust object detector under the supervised learning setting when the annotated data are scarce. Thus, previous approaches tackling this problem are in two categories: semi-supervised learning models that interpolate labeled data from unlabeled data, and self-supervised learning approaches that exploit signals within unlabeled data via pretext tasks. To seamlessly integrate and enhance existing supervised object detection methods, in this work, we focus on addressing the data scarcity problem from a fundamental viewpoint without changing the supervised learning paradigm. We propose a new offline data augmentation method for object detection, which semantically interpolates the training data with novel views. Specifically, our new system generates controllable views of training images based on differentiable neural rendering, together with corresponding bounding box annotations which involve no human intervention. Firstly, we extract and project pixel-aligned image features into point clouds while estimating depth maps. We then re-project them with a target camera pose and render a novel-view 2d image. Objects in the form of keypoints are marked in point clouds to recover annotations in new views. Our new method is fully compatible with online data augmentation methods, such as affine transform, image mixup, etc. Extensive experiments show that our method, as a cost-free tool to enrich images and labels, can significantly boost the performance of object detection systems with scarce training data. Code is available at \url{https://github.com/Guanghan/DANR}.
翻译:当附加说明的数据稀少时,在受监督的学习环境中训练一个强大的物体探测器是具有挑战性的。因此,以前处理这一问题的方法分为两类:半受监督的学习模型,从未贴标签的数据中内插标签数据,以及自监督的学习方法,通过借口任务利用未贴标签的数据中的信号;在这项工作中,为了从基本角度无缝地整合和加强现有的受监督的物体探测方法,我们在不改变受监督的学习范式的情况下,侧重于从基本角度处理数据稀缺问题;我们提出了新的目标探测离线数据增强方法,用新观点将培训数据进行内插。具体地说,我们的新系统生成了基于不同神经显示的可控培训图像的可控观点,以及没有人类干预的相应捆绑框说明。首先,我们从点云中提取和投射与图像相像,同时估计深度图。然后,我们用目标相机进行重新投影,并制作新视图2的图像。关键点的物体在点上标记成云以新视角恢复说明。我们的新系统生成了基于不同神经的可控性图像,我们的新方法与在线数据升级的测试方法完全相容。