Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.
翻译:运动模仿是基于物理的角色动画中的一项基础任务。然而,大多数现有的运动模仿方法都基于强化学习(RL),并且存在重度奖励工程、高方差和慢收敛的问题,而且需要进行较难的探索。具体而言,它们通常需要数十个小时甚至数天的训练才能模仿简单的运动序列,导致可扩展性较差。在这项工作中,我们利用可微分物理模拟器(DPS),提出了一种高效的运动模仿方法,称为DiffMimic。我们的关键洞察是,DPS通过分析渐变和真实的物理先验,将复杂的策略学习任务转化为更简单的状态匹配问题,从而学习稳定的策略,因此比基于RL的方法具有显着更快和更稳定的收敛性。此外,为了逃脱局部最优解,我们利用演示重播机制,在长时间长度的情况下实现了稳定的渐变反向传播。对标准基准测试的广泛实验表明,DiffMimic具有比现有方法(例如DeepMimic)更好的样本效率和时间效率。值得注意的是,DiffMimic允许一个经过物理模拟的角色在10分钟的训练后学习后空翻,并在3小时的训练后学习循环,而现有方法可能需要大约一天的训练才能学习后空翻。更重要的是,我们希望DiffMimic能够通过类似可微分衣物模拟的技术,在未来的研究中造福更多的可微分动画系统。