示范发言变异的神经代表 (Neural Representations for Modeling Variation in Speech)

Variation in speech is often quantified by comparing phonetic transcriptions of the same utterance. However, manually transcribing speech is time-consuming and error prone. As an alternative, therefore, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between Norwegian dialect speakers. For comparison with several earlier studies, we evaluate how well these differences match human perception by comparing them with available human judgements of similarity. We show that speech representations extracted from a specific type of neural model (i.e. Transformers) lead to a better match with human perception than two earlier approaches on the basis of phonetic transcriptions and MFCC-based acoustic features. We furthermore find that features from the neural models can generally best be extracted from one of the middle hidden layers than from the final layer. We also demonstrate that neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot adequately be represented by a set of discrete symbols used in phonetic transcriptions.

翻译：语音变异往往通过比较同一词句的语音抄录来量化。然而,人工抄录的语音是耗费时间和容易出错的。因此,作为一种替代办法,我们调查从若干自我监督的神经模型中提取的声学嵌入器。我们用这些表达法来计算英语非本地和本地语言使用者之间以及挪威方言语言使用者之间基于字的发音差异。比较先前的几项研究,我们通过将这些差异与现有的人类相似性判断进行比较来评估这些差异与人类感知的相匹配程度。我们显示,从某种特定类型的神经模型(即变异器)中提取的语音表达方式比早期的两种方法更符合人类感知。我们进一步发现,神经模型的特征一般最好取自中间隐藏层之一,而不是最后层。我们还表明,神经语音表达方式不仅能够捕捉部分差异,而且还能反映国家和持续时间差异,这些差异不能被电话转录中使用的一套独立符号充分代表。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日