It is crucial to address the following issues for ubiquitous robotics manipulation applications: (a) vision-based manipulation tasks require the robot to visually learn and understand the object with rich information like dense object descriptors; and (b) sim-to-real transfer in robotics aims to close the gap between simulated and real data. In this paper, we present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency. We proposed an object-to-object matching method for image pairs from different scenes and different domains. This method helps reduce the effort of training data from real-world by taking advantage of public datasets, such as GraspNet. With sim-to-real object representation consistency, our SRDONs can serve as a building block for a variety of sim-to-real manipulation tasks. We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
翻译:----
为了在广泛的机器人操作应用中解决以下问题-(a)视觉操作任务要求机器人通过密集的目标描述符视觉学习和理解物体;以及(b)机器人中的 Sim-to-Real 转移旨在缩小模拟数据和真实数据之间的差距。在本文中,我们介绍了 Sim-to-Real 密集对象网络(SRDONs),这是一种密集的目标描述符,不仅通过适当的表示来理解物体,而且还将模拟数据和真实数据映射到具有像素一致性的统一特征空间。我们提出了一种图像对的对象匹配方法,该方法可从不同场景和不同领域的图像对之间获取对象匹配。通过利用公共数据集(例如 GraspNet)来避免真实世界的训练数据的努力,可减少对真实世界培训数据的努力。通过 sim-to-real 目标表征一致性,在没有经过真实世界培训的情况下,我们的 SRDONs 可作为各种 sim-to-real 操作任务的构建块。我们进行的实验表明,预先训练的 SRDONs 显着提高了零真实世界培训的各种机器人任务的未见物体和未见视觉环境的表现。