Reinforcement learning has been demonstrated as a flexible and effective approach for learning a range of continuous control tasks, such as those used by robots to manipulate objects in their environment. But in robotics particularly, real-world rollouts are costly, and sample efficiency can be a major limiting factor when learning a new skill. In game environments, the use of world models has been shown to improve sample efficiency while still achieving good performance, especially when images or other rich observations are provided. In this project, we explore the use of a world model in a deformable robotic manipulation task, evaluating its effect on sample efficiency when learning to fold a cloth in simulation. We compare the use of RGB image observation with a feature space leveraging built-in structure (keypoints representing the cloth configuration), a common approach in robot skill learning, and compare the impact on task performance and learning efficiency with and without the world model. Our experiments showed that the usage of keypoints increased the performance of the best model on the task by 50%, and in general, the use of a learned or constructed reduced feature space improved task performance and sample efficiency. The use of a state transition predictor(MDN-RNN) in our world models did not have a notable effect on task performance.
翻译:强化学习被证明是一种灵活而有效的方法,用于学习一系列连续控制任务,例如机器人在环境中操纵物体时使用的控制任务。但特别是在机器人中,实际世界推出费用昂贵,样本效率在学习新技能时可能成为主要限制因素。在游戏环境中,世界模型的使用表明提高了抽样效率,同时仍然取得良好的业绩,特别是在提供图像或其他丰富的观测结果时。在这个项目中,我们探索了在可变机器人操作任务中使用世界模型,评估其在学习在模拟中叠布时对样本效率的影响。我们比较了RGB图像观测的使用与利用内建结构的地貌空间特征(代表布局配置的基点),这是机器人技能学习的共同方法,并将对任务绩效和学习效率的影响与世界模型进行比较。我们的实验表明,关键点的使用提高了最佳任务模型的绩效50%,一般而言,使用学习或构建的减少的地貌空间提高了任务绩效和样本效率。使用国家转型预测器(MDN-RNNNN)对于我们世界执行任务模型的影响并不明显。