Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to the statistics of the native language, are sufficient. We operationalize this idea using representations from two state-of-the-art speech models, a Dirichlet process Gaussian mixture model and the more recent wav2vec 2.0 model. We present a new, open dataset of French- and English-speaking participants' speech perception behaviour for 61 vowel sounds from six languages. We show that phoneme assimilation is a better predictor than fine-grained phonetic modelling, both for the discrimination behaviour as a whole, and for predicting differences in discriminability associated with differences in native language background. We also show that wav2vec 2.0, while not good at capturing the effects of native language on speech perception, is complementary to information about native phoneme assimilation, and provides a good model of low-level phonetic representations, supporting the idea that both categorical and fine-grained perception are used during speech perception.
翻译:我们比较了两种关于母语对语言感知影响的想法:一种是概念化的同化模式,它要求将声音分为土著方言类别,而另一种是丰富而细微的语音表达方式,它与当地方言的统计相适应,这已经足够。我们运用两种最先进的演讲模式,即Dirichlet进程Gaussian混合模式和最新的wav2vec2.0模式,来落实这一想法。我们展示了法语和英语参与者对语言感知行为的新开放数据集,这六种语言的61个元音调,我们展示了电话同化比精美的语音模型更好的预测者,这既是为了整个歧视行为,也是为了预测与土著语言背景差异相关的可容忍性差异。我们还显示,wav2vec 2.0虽然不是很好地捕捉母语感知到语音感知的效果,但却是支持土著方言同化和手机感知度的精确度的良好模型。