The usefulness of deep learning models in robotics is largely dependent on the availability of training data. Manual annotation of training data is often infeasible. Synthetic data is a viable alternative, but suffers from domain gap. We propose a multi-step method to obtain training data without manual annotation effort: From 3D object meshes, we generate images using a modern synthesis pipeline. We utilize a state-of-the-art image-to-image translation method to adapt the synthetic images to the real domain, minimizing the domain gap in a learned manner. The translation network is trained from unpaired images, i.e. just requires an un-annotated collection of real images. The generated and refined images can then be used to train deep learning models for a particular task. We also propose and evaluate extensions to the translation method that further increase performance, such as patch-based training, which shortens training time and increases global consistency. We evaluate our method and demonstrate its effectiveness on two robotic datasets. We finally give insight into the learned refinement operations.
翻译:在机器人中深层学习模型的有用性在很大程度上取决于培训数据的可用性。培训数据的人工说明往往不可行。合成数据是一个可行的替代办法,但存在领域差距。我们建议采用多步方法,在没有人工说明的情况下获取培训数据:从3D天体模类中,我们利用现代合成管道生成图像。我们使用最先进的图像到图像翻译方法,使合成图像适应真实领域,以学习的方式将域间差距缩小到最小。翻译网络从未显示的图像中培训,即仅仅需要未经附加说明的真实图像的收集。然后,生成和改良的图像可用于为某项特定任务培训深层学习模型。我们还提议和评价可进一步提高性能的翻译方法的扩展,如补丁培训,缩短培训时间,提高全球一致性。我们评估了我们的方法,并展示了它在两个机器人数据集上的有效性。我们最后深入了解了学习的改进操作。