Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made significant advancements in this domain, they mostly consider motion synthesis and style manipulation as two separate problems. This is mainly due to the challenge of learning both motion contents that account for the inter-class behaviour and styles that account for the intra-class behaviour effectively in a common representation. To tackle this challenge, we propose a denoising diffusion probabilistic model solution for styled motion synthesis. As diffusion models have a high capacity brought by the injection of stochasticity, we can represent both inter-class motion content and intra-class style behaviour in the same latent. This results in an integrated, end-to-end trained pipeline that facilitates the generation of optimal motion and exploration of content-style coupled latent space. To achieve high-quality results, we design a multi-task architecture of diffusion model that strategically generates aspects of human motions for local guidance. We also design adversarial and physical regulations for global guidance. We demonstrate superior performance with quantitative and qualitative results and validate the effectiveness of our multi-task architecture.
翻译:为数字人类带来现实的动作是计算机动画和游戏的核心但具有挑战性的一部分,因为人类动画和游戏的核心部分是具有挑战性的,因为人类动画在内容和风格上都是多种多样的。虽然最新的深层次学习方法在这一领域取得了显著进步,但它们大多认为运动合成和风格操控是两个不同的问题。这主要是由于需要学习运动内容,这些运动内容反映了不同阶级之间的行为和风格,以共同的代表性有效地反映不同阶级内部行为。为了应对这一挑战,我们建议为风格化的动画合成设计一个分散传播的多任务模型模型。由于传播模型由于注入随机性而具有很高的能力,我们也可以代表不同阶级之间的运动内容和同一潜力的同类风格行为。这体现在一个综合的、经过最终培训的管道上,该管道有助于产生最佳的移动和探索与内容风格相伴的潜伏空间。为了取得高质量的结果,我们设计了一个多任务传播模型的模型,从战略角度为地方指导提供人类动作的各个方面。我们还设计了全球指导的对立和物理规范。我们用定量和定性的架构展示了高性业绩,并验证了我们多任务的有效性。