Meaning is the foundation stone of intercultural communication. Languages are continuously changing, and words shift their meanings for various reasons. Semantic divergence in related languages is a key concern of historical linguistics. In this paper we investigate semantic divergence across languages by measuring the semantic similarity of cognate sets in multiple languages. The method that we propose is based on cross-lingual word embeddings. In this paper we implement and evaluate our method on English and five Romance languages, but it can be extended easily to any language pair, requiring only large monolingual corpora for the involved languages and a small bilingual dictionary for the pair. This language-agnostic method facilitates a quantitative analysis of cognates divergence -- by computing degrees of semantic similarity between cognate pairs -- and provides insights for identifying false friends. As a second contribution, we formulate a straightforward method for detecting false friends, and introduce the notion of "soft false friend" and "hard false friend", as well as a measure of the degree of "falseness" of a false friends pair. Additionally, we propose an algorithm that can output suggestions for correcting false friends, which could result in a very helpful tool for language learning or translation.
翻译:意思是跨文化交流的基石。 语言正在不断改变, 语言会因各种原因改变其含义。 相关语言的语义差异是历史语言的主要关切。 在本文中, 我们通过测量多种语言的同源词组的语义相似性, 调查不同语言的语义差异。 我们建议的方法基于跨语言的词嵌入。 在本文中, 我们用英语和五种罗姆语言来实施和评估我们的方法, 但可以很容易地推广到任何一对语言, 只要求相关语言使用大片单语的团体, 和一对小段双语词典。 这种语言通识方法有助于量化差异分析, 通过计算同源配方之间的语义相似性, 并为识别假朋友提供洞察力。 作为第二个贡献, 我们制定一种简单的方法来探测假朋友, 并引入“ 软假朋友” 和“ 硬假朋友” 的概念, 以及一个衡量假朋友对“ 假朋友” 的程度的尺度。 此外, 我们提议一种算法, 可以输出纠正假朋友的建议, 其结果为非常有用的翻译工具。