Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars"$\subseteq$"cars") and homographs (eg. "tongue"$\cap$"body" should be similar to "mouth", while "tongue"$\cap$"language" should be similar to "dialect") have natural set-theoretic interpretations. Box embeddings are a novel region-based representation which provide the capability to perform these set-theoretic operations. In this work, we provide a fuzzy-set interpretation of box embeddings, and learn box representations of words using a set-theoretic training objective. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a quantitative and qualitative analysis exploring the additional unique expressivity provided by Word2Box.
翻译:连续空间的文字表达也许是NLP的最根本任务, 但是语言互动的方式比矢量点产品相似性所能提供的方式要丰富得多。 字之间的许多关系可以用形容词- 词词( 例如“ 红车 $\ subseqeq$” cars ) 和同义词( 例如“ tongue” $\ cap$” body ” 应该类似于“ mouth ”, 而“ tongue” $\ cap$” 语言应该类似于“ diect ” 的自然设置理论解释。 框嵌入是一种基于区域的新表达方式, 提供了执行这些设置理论操作的能力。 在这项工作中, 我们对嵌入框进行模糊化解释, 并使用设置理论培训目标学习字框表达方式。 我们展示了各种类似词任务, 特别是不太常见的字的字眼, 并进行定量和定性分析, 探索Wob2Box 提供的其他独特表达性。