Transparent objects are common in our daily life and frequently handled in the automated production line. Robust vision-based robotic grasping and manipulation for these objects would be beneficial for automation. However, the majority of current grasping algorithms would fail in this case since they heavily rely on the depth image, while ordinary depth sensors usually fail to produce accurate depth information for transparent objects owing to the reflection and refraction of light. In this work, we address this issue by contributing a large-scale real-world dataset for transparent object depth completion, which contains 57,715 RGB-D images from 130 different scenes. Our dataset is the first large-scale, real-world dataset that provides ground truth depth, surface normals, transparent masks in diverse and cluttered scenes. Cross-domain experiments show that our dataset is more general and can enable better generalization ability for models. Moreover, we propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map. Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources. Real robot experiments show that our method can also be applied to novel transparent object grasping robustly. The full dataset and our method are publicly available at www.graspnet.net/transcg
翻译:透明天体在我们日常生活中很常见,并且经常在自动化生产线中处理。 强有力的视觉机器人捕捉和操作这些天体将有利于自动化。 但是,目前大多数掌握的算法在此情况下将失败,因为它们在很大程度上依赖深度图像,而普通深度传感器由于光的反射和折射,通常无法为透明天体提供准确的深度信息。 在这项工作中,我们通过提供大规模真实世界数据集来解决这个问题,以透明天体深度完成,其中包括来自130个不同场景的57,715 RGB-D图像。 我们的数据集是第一个大型真实世界数据集,提供地面真相深度、表面正常度、在多样化和杂乱的场景中透明的面具。 跨多层实验显示,我们的数据集比较一般,能够更好地概括模型。 此外,我们提议建立一个端对端深度完成网络的网络,将RGB图像和不准确的深度地图作为精细的深度地图。 实验显示我们的方法比以往工作更优、更高效、更可靠,真实的世界数据集也能在不同的屏幕上进行透明地测试。 高端的机器人分辨率的图像可以被应用在公开操作。