Robots deployed in human-centric environments may need to manipulate a diverse range of articulated objects, such as doors, dishwashers, and cabinets. Articulated objects often come with unexpected articulation mechanisms that are inconsistent with categorical priors: for example, a drawer might rotate about a hinge joint instead of sliding open. We propose a category-independent framework for predicting the articulation models of unknown objects from sequences of RGB-D images. The prediction is performed by a two-step process: first, a visual perception module tracks object part poses from raw images, and second, a factor graph takes these poses and infers the articulation model including the current configuration between the parts as a 6D twist. We also propose a manipulation-oriented metric to evaluate predicted joint twists in terms of how well a compliant robot controller would be able to manipulate the articulated object given the predicted twist. We demonstrate that our visual perception and factor graph modules outperform baselines on simulated data and show the applicability of our factor graph on real world data.
翻译:在以人为中心的环境中部署的机器人可能需要操纵各种各样的分解对象,如门、洗碗机和柜子。 分解的物体往往会出现出乎意料的表达机制,与绝对的前置机制不符: 例如, 抽屉可能会旋转一个关节, 而不是滑动打开。 我们提议一个独立的分类框架, 用于预测从 RGB- D 图像序列中测出未知物体的表达模型。 预测是通过一个两步过程进行的: 首先, 一个视觉感知模块跟踪物体部分由原始图像构成, 其次, 一个要素图将这些外观和推断出表达模型, 包括各部分之间的当前配置为6D 转动。 我们还提议了一个以操作为导向的指标, 以评价一个符合要求的机器人控制器在预测的转动中能够如何很好地操纵指定对象。 我们展示了我们的视觉感知和因子图形模块在模拟数据上超越了光学基线, 并展示了我们要素图在真实世界数据上的适用性。