How do neural networks "perceive" speech sounds from unknown languages? Does the typological similarity between the model's training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs) -- vector representations of variable-duration spoken-word segments. First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity. We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs. Our experiments show that typological similarity indeed affects the representational similarity of the models in our study. We further discuss the implications of our work on modeling speech processing and language similarity with neural networks.
翻译:神经网络“感知”语言声音来自未知语言的声音如何?模型培训语言(L1)和未知语言(L2)之间的类型相似性是否对L2语言信号的模型表达方式产生影响?为了回答这些问题,我们提出了一个基于表达相似性分析(RSA)的新颖的实验设计,以分析声词嵌入(AWES) -- -- 不同时期口语部分的矢量表达方式。首先,我们为七种印欧语言培训单一语言的AWE模式,这些语言具有不同程度的类型相似性。然后,我们使用RSA通过模拟本地语言和非本地语言口语处理方式来量化跨语言相似性。我们的实验表明,类型相似性确实影响到我们研究中模型的表达相似性。我们进一步讨论了我们关于语言处理模式和与神经网络语言相似性的工作的影响。