利用实时深层学习方法对自动机器人操纵的视觉波控制和格拉斯普探测 (Real-Time Deep Learning Approach to Visual Servo Control and Grasp Detection for Autonomous Robotic Manipulation)

In order to explore robotic grasping in unstructured and dynamic environments, this work addresses the visual perception phase involved in the task. This phase involves the processing of visual data to obtain the location of the object to be grasped, its pose and the points at which the robot`s grippers must make contact to ensure a stable grasp. For this, the Cornell Grasping dataset is used to train a convolutional neural network that, having an image of the robot`s workspace, with a certain object, is able to predict a grasp rectangle that symbolizes the position, orientation and opening of the robot`s grippers before its closing. In addition to this network, which runs in real-time, another one is designed to deal with situations in which the object moves in the environment. Therefore, the second network is trained to perform a visual servo control, ensuring that the object remains in the robot`s field of view. This network predicts the proportional values of the linear and angular velocities that the camera must have so that the object is always in the image processed by the grasp network. The dataset used for training was automatically generated by a Kinova Gen3 manipulator. The robot is also used to evaluate the applicability in real-time and obtain practical results from the designed algorithms. Moreover, the offline results obtained through validation sets are also analyzed and discussed regarding their efficiency and processing speed. The developed controller was able to achieve a millimeter accuracy in the final position considering a target object seen for the first time. To the best of our knowledge, we have not found in the literature other works that achieve such precision with a controller learned from scratch. Thus, this work presents a new system for autonomous robotic manipulation with high processing speed and the ability to generalize to several different objects.

翻译：为了探索机器人在非结构化和动态环境中的捕捉,这项工作针对的是任务所涉的视觉感知阶段。这个阶段涉及处理视觉数据, 以获得要捕捉的对象的位置、其形状和机器人抓抓器必须接触的点, 以确保稳定的捕捉。为此, Cornell Grasping 数据集被用于训练一个脉冲神经网络, 该网络拥有机器人工作空间的图像, 能够预测一个自动矩形, 该矩形在任务关闭之前象征着机器人的锁定对象的定位、方向和打开。除了这个实时运行的网络外, 另一个网络旨在处理物体在环境中移动的情况。因此, 第二个网络受过训练, 进行视觉感应控制, 确保物体保留在机器人的视野中。这个网络预测了直线和角速度的成比例值, 相机必须具备这样的成正比值, 这样, 该对象总是在最终网络处理的图像中找到位置。此外, 在实时运行中, 用于操作操作的数据集是自动生成的, 并且通过机器人的操作结果, 正在通过一个高级操作过程, 将一个设计到一个虚拟操作结果。这个系统。。这个系统将一个经过一个经过一个模拟, 在将一个模拟操作过程中, 进行到一个模拟中, 一个真实操作过程,, 将一个经过一个通过一个模拟进行进行进行进行进行, 进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行操作进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行进行的的进行