MMVGrasp: 在高度封闭环境中实时多视3D物体切除 (MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments)

Nowadays service robots are entering more and more in our daily life. In such a dynamic environment, a robot frequently faces pile, packed, or isolated objects. Therefore, it is necessary for the robot to know how to grasp and manipulate various objects in different situations to help humans in everyday tasks. Most state-of-the-art grasping approaches addressed four degrees-of-freedom (DoF) object grasping, where the robot is forced to grasp objects from above based on grasp synthesis of a given top-down scene. Although such approaches showed a very good performance in predefined industrial settings, they are not suitable for human-centric environments as the robot will not able to grasp a range of household objects robustly, for example, grasping a bottle from above is not stable. In this work, we propose a multi-view deep learning approach to handle robust object grasping in human-centric domains. In particular, our approach takes a partial point cloud of a scene as an input, and then, generates multi-views of existing objects. The obtained views of each object are used to estimate pixel-wise grasp synthesis for each object. To evaluate the performance of the proposed approach, we performed extensive experiments in both simulation and real-world environments within the pile, packed, and isolated objects scenarios. Experimental results showed that our approach can estimate appropriate grasp configurations in only 22ms without the need for explicit collision checking. Therefore, the proposed approach can be used in real-time robotic applications that need closed-loop grasp planning.

翻译：目前,服务机器人越来越多地进入我们的日常生活中。在这种充满活力的环境中,机器人经常面对堆积、包装或孤立的物体。因此, 机器人必须知道如何在不同情况下掌握和操作各种物体以帮助人类完成日常任务。大多数最先进的掌握方法涉及四度自由(DoF)物体的掌握, 机器人被迫根据对给定自上而下场景的把握合成从上方抓取物体。虽然这种方法在预先定义的工业环境中表现非常好, 但不适合以人为为中心的环境, 因为机器人无法强有力地掌握一系列家用物体, 例如, 从上面拿瓶子来帮助人类完成日常任务。在这项工作中,我们提出一个多视角的深层次学习方法, 处理在人类中心场景中捕捉强物体的四度自由( DoF) 目标的掌握。特别是, 我们的方法将一个场景的局部点云作为输入, 然后生成对现有物体的多重观点。每种物体的获取观点都用于估算每个物体的像素和以人类为中心的合成。例如, 在真实的模型中, 我们所展示的模拟 22 的模型环境, 我们所展示了真实的模型。