反应灵敏的听力主管一代人:基准数据集和基线 (Responsive Listening Head Generation: A Benchmark Dataset and Baseline)

We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head generation, listening head generation takes as input both the audio and visual signals from the speaker, and gives non-verbal feedbacks (e.g., head motions, facial expressions) in a real-time manner. Our dataset supports a wide range of applications such as human-to-human interaction, video-to-video translation, cross-modal understanding and generation. To encourage further research, we also release a listening head generation baseline, conditioning on different listening attitudes. Code & ViCo dataset: https://project.mhzhou.com/vico.

翻译：我们提出了一个新的监听头生成基准,用于在面对面的谈话中将听众(例如,点头、微笑)的响应性反馈综合在一起。作为谈话头一代不可或缺的补充,在文学中很少研究监听头一代。自动合成对谈话头一代积极反应的监听行为,对于数字人、虚拟代理人和社会机器人等应用至关重要。在这项工作中,我们提议建立一个新的数据集“ViCo”,在面对面的谈话中突出监听头一代。共有92个身份(67个发言者和76个听众)参与维科,其中483个剪辑以配对式的“听话”模式进行,听众根据他们的态度展示三种监听风格:积极、中立、消极。不同于传统的语音对声音或说话头版的生成,监听头一代将发言人的音频和视觉信号作为输入,并以实时方式提供非口头的反馈(例如,头部运动、面部表表达方式),共有483个剪辑剪辑。我们的数据设置支持将人际的视频和网络生成数据转换到网络的多种应用,例如:通过视频和视频对视频生成数据进行更深层次的翻版。我们的数据理解,还支持了对视频和视频对视频生成数据进行广泛的翻译。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日