Deep imitation learning is a promising approach that does not require hard-coded control rules in autonomous robot manipulation. The current applications of deep imitation learning to robot manipulation have been limited to reactive control based on the states at the current time step. However, future robots will also be required to solve tasks utilizing their memory obtained by experience in complicated environments (e.g., when the robot is asked to find a previously used object on a shelf). In such a situation, simple deep imitation learning may fail because of distractions caused by complicated environments. We propose that gaze prediction from sequential visual input enables the robot to perform a manipulation task that requires memory. The proposed algorithm uses a Transformer-based self-attention architecture for the gaze estimation based on sequential data to implement memory. The proposed method was evaluated with a real robot multi-object manipulation task that requires memory of the previous states.
翻译:深模仿学习是一种很有希望的方法,它不需要自动机器人操作中的硬码控制规则。 目前对机器人操作的深度模仿学习应用限于基于当前时间步骤状态的被动控制。 但是,未来机器人还需要利用在复杂环境中的经验(例如当机器人被要求在架子上找到一个先前使用的物体时)获得的记忆来完成任务。 在这种情况下,简单的深度模仿学习可能因复杂环境造成的分心而失败。 我们提议通过连续视觉输入的视觉预测使机器人能够执行需要记忆的操作任务。 提议的算法使用基于变换器的自我注意结构来根据序列数据进行视觉估计,以落实记忆。 拟议的方法是用真正的机器人多点操纵任务来评估的,这需要对以前的状态进行记忆记忆。