Real-time control for robotics is a popular research area in the reinforcement learning (RL) community. Through the use of techniques such as reward shaping, researchers have managed to train online agents across a multitude of domains. Despite these advances, solving goal-oriented tasks still require complex architectural changes or heavy constraints to be placed on the problem. To address this issue, recent works have explored how curriculum learning can be used to separate a complex task into sequential sub-goals, hence enabling the learning of a problem that may otherwise be too difficult to learn from scratch. In this article, we present how curriculum learning, reward shaping, and a high number of efficiently parallelized environments can be coupled together to solve the problem of multiple cube stacking. Finally, we extend the best configuration identified on a higher complexity environment with differently shaped objects.
翻译:机器人的实时控制是强化学习(RL)社区中一个受欢迎的研究领域。通过使用奖励制成等技术,研究人员设法在多个领域培训在线代理。尽管取得了这些进展,但解决面向目标的任务仍需要复杂的建筑变革或对问题施加严重的制约。为了解决这一问题,最近的工作探讨了如何利用课程学习将复杂任务分为顺序分级的次级目标,从而能够了解一个否则可能难以从零中学习的问题。在文章中,我们介绍了如何将课程学习、奖励制成和大量高效平行环境结合起来,以解决多立方体堆叠问题。最后,我们扩展了在具有不同形状对象的更高复杂环境中确定的最佳配置。