In this paper, we are concerned with the phenomenon of function word polysemy. We adopt the framework of distributional semantics, which characterizes word meaning by observing occurrence contexts in large corpora and which is in principle well situated to model polysemy. Nevertheless, function words were traditionally considered as impossible to analyze distributionally due to their highly flexible usage patterns. We establish that contextualized word embeddings, the most recent generation of distributional methods, offer hope in this regard. Using the German reflexive pronoun 'sich' as an example, we find that contextualized word embeddings capture theoretically motivated word senses for 'sich' to the extent to which these senses are mirrored systematically in linguistic usage.
翻译:在本文中,我们关注功能性的单词多元性现象。 我们采用了分布式语义学框架,它通过观察大型社团中发生的情况来描述文字含义, 原则上, 它非常适合模拟多元性。 尽管如此, 功能性语言传统上被认为无法进行分布分析, 因为它们使用模式非常灵活。 我们建立了背景化的单词嵌入, 这是最近一代分配方法, 提供了这方面的希望。 我们以德国反射性代名词“ sich”为例, 我们发现背景化的单词嵌入为“ sich” 获取了理论上的“sich” 感知, 以至于这些感知在语言使用中被系统地反映的程度。