以物理模拟学习和交互式字符控制为基物理模拟学习和交互式字符控制采取类似GAN办法 (A GAN-Like Approach for Physics-Based Imitation Learning and Interactive Character Control)

We present a simple and intuitive approach for interactive control of physically simulated characters. Our work builds upon generative adversarial networks (GAN) and reinforcement learning, and introduces an imitation learning framework where an ensemble of classifiers and an imitation policy are trained in tandem given pre-processed reference clips. The classifiers are trained to discriminate the reference motion from the motion generated by the imitation policy, while the policy is rewarded for fooling the discriminators. Using our GAN-based approach, multiple motor control policies can be trained separately to imitate different behaviors. In runtime, our system can respond to external control signal provided by the user and interactively switch between different policies. Compared to existing methods, our proposed approach has the following attractive properties: 1) achieves state-of-the-art imitation performance without manually designing and fine tuning a reward function; 2) directly controls the character without having to track any target reference pose explicitly or implicitly through a phase state; and 3) supports interactive policy switching without requiring any motion generation or motion matching mechanism. We highlight the applicability of our approach in a range of imitation and interactive control tasks, while also demonstrating its ability to withstand external perturbations as well as to recover balance. Overall, our approach generates high-fidelity motion, has low runtime cost, and can be easily integrated into interactive applications and games.

翻译：我们的工作建立在基因对抗网络(GAN)和强化学习的基础上,并引入了模仿学习框架,通过这个框架,对混合的分类者和仿照政策进行同步培训,根据预先处理过的参考剪辑进行同步培训。分类者接受培训,以区别模仿政策产生的运动的参考动作,而政策则因欺骗歧视者而得到奖励。使用我们的基于GAN的方法,多种运动控制政策可以单独培训,以模仿不同的行为。在运行期间,我们的系统可以响应用户提供的外部控制信号,在不同政策之间进行互动转换。与现有方法相比,我们提议的方法具有以下吸引人的特性:(1)在没有手动设计和微调奖励功能的情况下实现最新模仿性表现;(2)直接控制特性,而不必跟踪任何目标参考在阶段状态中明确或隐含;(3)支持互动政策转换,而不需要任何运动生成或运动匹配机制。我们强调,我们的方法在模仿和互动控制方法范围内的适用性,同时显示其可快速恢复的外部能力。