We propose to leverage a real-world, human activity RGB dataset to teach a robot Task-Oriented Grasping (TOG). We develop a model that takes as input an RGB image and outputs a hand pose and configuration as well as an object pose and a shape. We follow the insight that jointly estimating hand and object poses increases accuracy compared to estimating these quantities independently of each other. Given the trained model, we process an RGB dataset to automatically obtain the data to train a TOG model. This model takes as input an object point cloud and outputs a suitable region for task-specific grasping. Our ablation study shows that training an object pose predictor with the hand pose information (and vice versa) is better than training without this information. Furthermore, our results on a real-world dataset show the applicability and competitiveness of our method over state-of-the-art. Experiments with a robot demonstrate that our method can allow a robot to preform TOG on novel objects.
翻译:我们建议利用真实世界的人类活动 RGB 数据集来教授机器人任务导向的刻度(TOG) 。 我们开发了一种模型, 将 RGB 图像和输出作为输入输入, 将手形和形状以及物体的形状和形状作为输入。 我们遵循这样的洞察力, 即联合估计手和物体会提高准确性, 而不是独立估计这些数量。 根据经过培训的模型, 我们处理一个 RGB 数据集, 以自动获取数据来训练TOG 模型。 这个模型将一个对象点云和输出出一个适合具体任务掌握的区域作为输入。 我们的膨胀研究表明, 以手形信息( 反之) 来训练一个对象显示显示一个对象的预测器比训练更好。 此外, 我们在一个真实世界的数据集上的结果显示了我们的方法对最新技术的实用性和竞争力。 与机器人的实验表明, 我们的方法可以允许机器人在新对象上预先形成TOG 。