Real-time control for robotics is a popular research area in the reinforcement learning community. Through the use of techniques such as reward shaping, researchers have managed to train online agents across a multitude of domains. Despite these advances, solving goal-oriented tasks still requires complex architectural changes or hard constraints to be placed on the problem. In this article, we solve the problem of stacking multiple cubes by combining curriculum learning, reward shaping, and a high number of efficiently parallelized environments. We introduce two curriculum learning settings that allow us to separate the complex task into sequential sub-goals, hence enabling the learning of a problem that may otherwise be too difficult. We focus on discussing the challenges encountered while implementing them in a goal-conditioned environment. Finally, we extend the best configuration identified on a higher complexity environment with differently shaped objects.
翻译:实时控制机器人是强化学习领域的研究热点。通过使用奖励塑形等技术,研究人员已经能够训练在线代理,应用于多个领域。尽管取得了这些进展,但解决目标导向型任务仍需要对问题进行复杂的架构更改或硬性限制。在本文中,我们通过组合课程学习,奖励塑形和高效并行运行的环境来解决多个立方体的叠放问题。我们介绍了两种课程学习方法来将复杂任务分解成顺序子目标,从而实现学习本应无法完成的问题。我们重点讨论了在目标条件化环境中实施课程学习时遇到的挑战。最后,我们在形状不同的对象的更高复杂环境中扩展了最佳配置。