Word embedding is a fundamental technology in natural language processing. It is often exploited for tasks using sets of words, although standard methods for representing word sets and set operations remain limited. If we can leverage the advantage of word embedding for such set operations, we can calculate sentence similarity and find words that effectively share a concept with a given word set in a straightforward way. In this study, we formulate representations of sets and set operations in a pre-trained word embedding space. Inspired by \textit{quantum logic}, we propose a novel formulation of set operations using subspaces in a pre-trained word embedding space. Based on our definitions, we propose two metrics based on the degree to which a word belongs to a set and the similarity between embedding two sets. Our experiments with Text Concept Set Retrieval and Semantic Textual Similarity tasks demonstrated the effectiveness of our proposed method.
翻译:单词嵌入是自然语言处理中的一项基本技术。 它常常被用于使用数组字的任务, 尽管代表字组和设定操作的标准方法仍然有限 。 如果我们能够利用单词嵌入功能的优势, 我们可以计算句式相似性, 并找到能与一个特定词有效地共享概念的单词, 以直截了当的方式设置。 在本研究中, 我们用一个预训练的字嵌入空间来表达数据集和设置操作。 在\ textit{quantum 逻辑} 的启发下, 我们建议用一个预训练的字嵌入空间中的子空间来制定一套操作的新型公式。 根据我们的定义, 我们根据一个单词属于某一集的程度和嵌入两组的相似性提出两个衡量标准。 我们用文本概念设置 Retrerival 和 语义相似性文本任务进行的实验展示了我们拟议方法的有效性 。