Pre-trained word representations became a key component in many NLP tasks. However, the global geometry of the word embeddings remains poorly understood. In this paper, we demonstrate that a typical word embeddings cloud is shaped as a high-dimensional simplex with interpretable vertices and propose a simple yet effective method for enumeration of these vertices. We show that the proposed method can detect and describe vertices of the simplex for GloVe and fasttext spaces.
翻译:未经培训的文字表达方法成为许多国家语言方案任务的一个关键组成部分。 但是,嵌入词的全球几何仍然不易理解。 在本文中,我们证明典型的字嵌入云是高维的简单字形,带有可解释的脊椎,我们提出了列举这些脊椎的简单而有效的方法。我们表明,拟议的方法可以探测和描述GloVe和快速文本空间的简单字眼的脊椎。