Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understanding what impacts human perception of quality. In this work, we focus on the degree of articulation and run a series of experiments to study how articulation strength impacts human perception of lip motion accompanying speech. Specifically, we study how increasing under-articulated (dampened) and over-articulated (exaggerated) lip motion affects human perception of quality. We examine the impact of articulation strength on human perception when considering only lip motion, where viewers are presented with talking faces represented by landmarks, and in the context of embodied characters, where viewers are presented with photo-realistic videos. Our results show that viewers prefer over-articulated lip motion consistently more than under-articulated lip motion and that this preference generalizes across different speakers and embodiments.
翻译:从音频到模拟语音制作产生现实的口头运动对于推动自然品格动画至关重要。 先前的研究显示,用于优化和评估通过语言产生唇动的模式的传统衡量尺度并不是动画质量主观观点的良好指标。 设计与主观观点一致的衡量尺度首先需要了解什么影响人类对质量的看法。 在这项工作中,我们侧重于表达程度并进行一系列实验,研究表达强度如何影响人类对伴随语言而来的唇动的感知。 具体地说,我们研究如何增加未经充分阐述的(节制的)和过度阐述的(夸张的)唇动会影响人类对质量的感知。我们在考虑仅仅口头运动时,即观众以由地标代表的谈话面来展示时,以及从体现性格的人物的角度来审视表达力对人类感的影响。 我们的结果显示,观众更喜欢过度阐述的唇动比未经充分阐述的口头动作更持续地影响人类对质量的感知,而且这种偏好在不同的演讲者和化者之间是普遍的。