Getting up from an arbitrary fallen state is a basic human skill. Existing methods for learning this skill often generate highly dynamic and erratic get-up motions, which do not resemble human get-up strategies, or are based on tracking recorded human get-up motions. In this paper, we present a staged approach using reinforcement learning, without recourse to motion capture data. The method first takes advantage of a strong character model, which facilitates the discovery of solution modes. A second stage then learns to adapt the control policy to work with progressively weaker versions of the character. Finally, a third stage learns control policies that can reproduce the weaker get-up motions at much slower speeds. We show that across multiple runs, the method can discover a diverse variety of get-up strategies, and execute them at a variety of speeds. The results usually produce policies that use a final stand-up strategy that is common to the recovery motions seen from all initial states. However, we also find policies for which different strategies are seen for prone and supine initial fallen states. The learned get-up control strategies often have significant static stability, i.e., they can be paused at a variety of points during the get-up motion. We further test our method on novel constrained scenarios, such as having a leg and an arm in a cast.
翻译:从任意衰落状态中升起是一种基本的人类技能。 学习这种技能的现有方法往往产生高度动态和不稳定的升迁动作,这些动作不象人类升迁战略,或者基于跟踪所记录的人类升迁动作。 在本文中,我们提出了一个使用强化学习的分阶段方法,而不用借助运动捕获数据。 方法首先利用强势的性格模型,这有利于发现解决方案模式。 第二阶段然后学习调整控制政策,使之适应逐渐变弱的性格版本。 最后, 第三阶段学习控制政策,这种政策可以以慢得多的速度复制较弱的升迁动作。 我们显示,该方法可以多运行,发现各种各样的升迁战略,并以不同的速度执行这些战略。 其结果通常产生一种政策, 使用一种与所有初始州所看到的恢复动作相同的最后的起立战略。 但是, 我们还发现一些政策,对于易变弱和后退的状态,可以看到不同的战略。 学习的升迁控制战略往往具有显著的静态稳定性, 也就是说, 它们在多个运行过程中可以停下来, 进行新的测试。