示范发言变异的神经代表 (Neural Representations for Modeling Variation in Speech)

Variation in speech is often represented and investigated using phonetic transcriptions, but transcribing speech is time-consuming and error prone. As an alternative representation, therefore, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between different dialect pronunciations, and evaluate these differences by comparing them with available human native-likeness judgments. We show that Transformer-based speech representations lead to significant performance gains over the use of phonetic transcriptions, and find that feature-based use of Transformer models is most effective with one of the middle layers instead of the final layer. We also demonstrate that these neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot be represented by a set of discrete symbols used in phonetic transcriptions.

翻译：语言的变异往往被用语音抄录来表示和调查,但抄录的言论耗费时间且容易出错。因此,作为一种替代表述,我们调查从若干自监神经模型中提取的声学嵌入器。我们使用这些表述来计算英语非本地语言和本地语言语言语言之间以及不同方言读音之间的单词发音差异,并通过将这些差异与现有的人类本地相似性判断进行比较来评估这些差异。我们显示,基于变换器的语音表述在使用语音抄录方面带来显著的绩效收益,并发现以地基为基础的变异器模型的使用在中间层而不是最后层中层中最为有效。我们还表明,这些神经语音表述不仅反映非本地语言语言和本地语言的发音差异,而且反映国家和持续时间的差异,这些差异不能通过在语音抄录中使用的一组离散符号来代表。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/