Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.
翻译:语言嵌入是许多自然语言处理结构的基础, 包括英语和其他语言。 为了深入了解语言嵌入, 我们探索语言的稳定性( 例如, 不同嵌入空间中一个词的近邻之间重叠 ) 。 我们讨论与稳定相关的语言特性, 探讨与固定、 语言性别系统和其他特征的关联性。 这对嵌入使用有影响, 尤其是用于研究语言趋势的研究中。