Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.
翻译:在物理学动画和深入强化学习中,学习攀爬是最常见的任务之一。学习的政策是有待解决的问题的产物,如RL环境和RL算法所体现。虽然对RL算法给予了极大的关注,但对设计选择对RL环境的影响却知之甚少。在本文中,我们表明环境设计很重要,并记录环境设计如何有助于许多RL结果的微小性质。具体地说,我们研究与国家表现、初步状态分配、奖励结构、控制频率、事件终止程序、课程使用、行动空间和极乐限制等相关的选择。我们的目标是促进围绕这些选择的讨论,在实践中,这些选择在应用到持续操作控制动画利益问题时,如学习隐形等,对RL的成功产生强烈影响。