Due to the increasing demand in films and games, synthesizing 3D avatar animation has attracted much attention recently. In this work, we present a production-ready text/speech-driven full-body animation synthesis system. Given the text and corresponding speech, our system synthesizes face and body animations simultaneously, which are then skinned and rendered to obtain a video stream output. We adopt a learning-based approach for synthesizing facial animation and a graph-based approach to animate the body, which generates high-quality avatar animation efficiently and robustly. Our results demonstrate the generated avatar animations are realistic, diverse and highly text/speech-correlated.
翻译:由于电影和游戏的需求不断增加,合成 3D avatar 动画最近引起了人们的极大关注。 在这项工作中,我们展示了一个可供制作的文本/语音驱动的全体动画合成系统。从文本和相应的演讲来看,我们的系统同时合成面部和身体动画,然后剥皮并制作这些动画,以获得视频流输出。我们采用了一种基于学习的合成面部动画的方法和一种基于图表的动画方法,使身体成形,从而高效和有力地生成高质量的异形动画。我们的结果显示,产生的动画与异形动画是现实的、多样的和高度的文本/语音相关。