We propose a framework for robust and efficient training of Dense Object Nets (DON) with a focus on multi-object robot manipulation scenarios. DON is a popular approach to obtain dense, view-invariant object descriptors, which can be used for a multitude of downstream tasks in robot manipulation, such as, pose estimation, state representation for control, etc.. However, the original work focused training on singulated objects, with limited results on instance-specific, multi-object applications. Additionally, a complex data collection pipeline, including 3D reconstruction and mask annotation of each object, is required for training. In this paper, we further improve the efficacy of DON with a simplified data collection and training regime, that consistently yields higher precision and enables robust tracking of keypoints with less data requirements. In particular, we focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme. We additionally propose an alternative loss formulation to the original pixelwise formulation that offers better results and is less sensitive to hyperparameters. Finally, we demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.
翻译:我们提议了一个对Dony Objects Nets (DON) 进行稳健和高效培训的框架,重点是多物体操纵方案。DON是一种获得密集的、可视化的物体描述仪的流行方法,可用于机器人操纵的众多下游任务,例如,提出估计、国家控制代表等。然而,最初的工作重点是对单体对象的培训,在具体实例、多物体应用方面的结果有限。此外,培训还需要一个复杂的数据收集管道,包括3D重建和掩盖每个物体的说明。在本文中,我们通过简化数据收集和培训制度,进一步提高DON的效率,不断提高精确度,并能够以较少数据要求的方式对关键点进行有力的跟踪。特别是,我们注重多点数据培训,而不是单体对象,同时采用精选的增强计划。我们还建议对原始的像素配方提出替代损失配方,以更好的结果,对超光度计。最后,我们展示了我们提议的关于真实世界机器人任务框架的稳健性和准确性。