Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
翻译:3D 场景的人类心理图像形式 3D 场景的人类心理图像 支持反事实想象、规划和运动控制 。 我们从先前未见的视角预测场景的外观和可承受性的能力 有助于我们执行操作任务(例如 6-DoF Knockting), 目前对于现有的机器人学习框架来说已经无法达到的方便程度 。 在这项工作中, 我们的目标是在想象的图像之上建立能够类似地规划行动的人工系统 。 为此, 我们引入了机器人福特斯( MIRA) 的心理图像( MIRA), 这是一种行动推理框架, 以小视合成和价格预测在环绕中优化行动。 根据一套 2D RGB 的图像, MIRA 构建了一个统一的 3D 场景代表, 我们通过它合成了小的或地理观点, 能够提供精准行动优化的预测。 我们说明这个优化过程如何让我们在有限的演示中将6- DoF 机器人操纵任务推广到不可见的机外旋转,, 向机器铺平的路径, 以便自主地了解周围的世界规划行动。