Training robot manipulation policies is a challenging and open problem in robotics and artificial intelligence. In this paper we propose a novel and compact state representation based on the rewards predicted from an image-based task success classifier. Our experiments, using the Pepper robot in simulation with two deep reinforcement learning algorithms on a grab-and-lift task, reveal that our proposed state representation can achieve up to 97% task success using our best policies.
翻译:培训机器人操纵政策在机器人和人工智能领域是一个具有挑战性和开放性的问题。 在本文中,我们基于一个基于图像的任务成功分类师所预测的奖赏,提出了一个新的和紧凑的国家代表制。 我们的实验用两个深强化的学习算法模拟“抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓抓 ”, 表明我们拟议的国家代表制可以达到97%的任务成功率。