以跨模式流体代表制为基础的多发言者面对面对语音模型 (Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations)

In this paper, we propose an effective method to synthesize speaker-specific speech waveforms by conditioning on videos of an individual's face. Using a generative adversarial network (GAN) with linguistic and speaker characteristic features as auxiliary conditions, our method directly converts face images into speech waveforms under an end-to-end training framework. The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model. Since these two features are uncorrelated and controlled independently, we can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images. Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model. We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. Specifically, we evaluate the performances of the linguistic feature and the speaker characteristic generation modules by measuring the accuracy of automatic speech recognition and automatic speaker/gender recognition tasks, respectively. We also evaluate the naturalness of the synthesized speech waveforms using a mean opinion score (MOS) test.

翻译：在本文中,我们提出一种有效的方法,通过对某人脸部的视频进行调制,合成特定发言者的语音波形。使用具有语言和语种特点作为辅助条件的基因对抗网络(GAN),我们的方法可以直接将脸部图像转换成一个端对端培训框架的语音波形。语言特征通过唇读模型从唇语运动中提取,而发言者的特征则通过使用经过预先训练的声学模型从脸部图像中预测。由于这两个特征不相干,而且独立控制,因此我们可以灵活合成其语音特征因输入面部图像而异的语音波形。因此,我们的方法可以被视为一个多语种面部对口语波形的波形模型。我们在客观和主观评价结果方面表现出我们提议的模型优于常规方法的优势。具体地说,我们通过测量自动语音识别的准确性以及自动演讲/性别识别任务,评估语言特征和发言者的特征生成模块的性能。我们还评估了使用一种中度分分度测试的合成语音波形体的自然性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/