Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when training on downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore SMP can compose different styles to synthesize new styles not present in the original dataset. Our method produces high-quality motion comparable to state-of-the-art adversarial imitation learning methods through reusable and modular motion priors. We demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video demo available at https://youtu.be/ravlZJteS20
翻译:数据驱动的运动先验能够引导智能体产生自然行为,在创建逼真虚拟角色中发挥着关键作用。对抗式模仿学习已成为从参考运动数据中学习运动先验的高效方法。然而,除少数例外,对抗式先验需要为每个新控制器重新训练,这限制了其可复用性,且在下游任务训练时必须保留参考运动数据。本研究提出分数匹配运动先验(SMP),利用预训练的运动扩散模型和分数蒸馏采样(SDS)构建可复用的任务无关运动先验。SMP可在运动数据集上独立于任何控制策略或任务进行预训练。训练完成后,SMP可保持冻结状态,作为通用奖励函数复用于训练下游任务策略以生成自然行为。我们证明,基于大规模数据集训练的通用运动先验可转化为多种风格特异性先验。此外,SMP能组合不同风格以合成原始数据集中不存在的新风格。通过可复用模块化运动先验,本方法生成的运动质量与最先进的对抗式模仿学习方法相当。我们在物理模拟人形角色的多样化控制任务中验证了SMP的有效性。视频演示详见 https://youtu.be/ravlZJteS20