Virtual character animation control is a problem for which Reinforcement Learning (RL) is a viable approach. While current work have applied RL effectively to portray physics-based skills, social behaviours are challenging to design reward functions for, due to their lack of physical interaction with the world. On the other hand, data-driven implementations for these skills have been limited to supervised learning methods which require extensive training data and carry constraints on generalisability. In this paper, we propose RLAnimate, a novel data-driven deep RL approach to address this challenge, where we combine the strengths of RL together with an ability to learn from a motion dataset when creating agents. We formalise a mathematical structure for training agents by refining the conceptual roles of elements such as agents, environments, states and actions, in a way that leverages attributes of the character animation domain and model-based RL. An agent trained using our approach learns versatile animation dynamics to portray multiple behaviours, using an iterative RL training process, which becomes aware of valid behaviours via representations learnt from motion capture clips. We demonstrate, by training agents that portray realistic pointing and waving behaviours, that our approach requires a significantly lower training time, and substantially fewer sample episodes to be generated during training relative to state-of-the-art physics-based RL methods. Also, compared to existing supervised learning-based animation agents, RLAnimate needs a limited dataset of motion clips to generate representations of valid behaviours during training.
翻译:虚拟字符动画控制是一个问题,加强学习是一个可行的方法。虽然目前的工作有效地应用了RL来描述物理基础的技能,但社会行为对于设计奖励功能具有挑战性,因为缺乏与世界的物理互动;另一方面,以数据驱动的这些技能的实施仅限于监督的学习方法,需要广泛的培训数据,并限制普遍性。在本文件中,我们提议了RLAmate,这是一个新的以数据驱动的深度RL办法,以应对这一挑战,我们把RL的长处与创建代理人时从运动数据集中学习的能力结合起来。我们通过完善诸如代理人、环境、状态和行动等要素的概念作用,为培训代理人制定一种数学结构。另一方面,数据驱动这些技能的实施仅限于以监督为指导的学习方法,这些方法需要广泛的培训数据动画领域和模型基础的模型。我们培训人员通过从运动抓图中学会的演示来了解正确的行为。我们通过描述现实的指点和挥动行为的能力,我们通过改进培训代理人来为培训代理人制定数学结构结构结构,从而大大降低现有运动动动动动动画的样本,此外,在学习过程期间,我们的方法要求大大降低现有方向的抽样,在学习过程中产生。