基于具体任务的特定依赖性单词嵌入方法 (Task-Specific Dependency-based Word Embedding Methods)

Two task-specific dependency-based word embedding methods are proposed for text classification in this work. In contrast with universal word embedding methods that work for generic tasks, we design task-specific word embedding methods to offer better performance in a specific task. Our methods follow the PPMI matrix factorization framework and derive word contexts from the dependency parse tree. The first one, called the dependency-based word embedding (DWE), chooses keywords and neighbor words of a target word in the dependency parse tree as contexts to build the word-context matrix. The second method, named class-enhanced dependency-based word embedding (CEDWE), learns from word-context as well as word-class co-occurrence statistics. DWE and CEDWE are evaluated on popular text classification datasets to demonstrate their effectiveness. It is shown by experimental results they outperform several state-of-the-art word embedding methods.

翻译：在这项工作中,为文本分类提出了两种基于特定任务的基于依赖的字嵌入方法。与通用的嵌入方法相比,我们设计了用于通用任务的基于特定任务的字嵌入方法,以便在具体任务中提供更好的业绩。我们的方法遵循了PPMI矩阵要素化框架,并从依赖性分析树中得出文字背景。第一个方法称为基于依赖的嵌入词(DWE),选择了依赖性剖析树中一个目标字的关键字和相邻字词,作为构建文字矩阵的背景。第二种方法,即称为类强化基于依赖的字嵌入方法(CEDWE),从文字和单级共生统计数据中学习。DWE和CEDWE,用流行的文本分类数据集进行评估,以证明其有效性。实验结果显示,它们超越了几个最先进的词嵌入方法。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

网络表示学习算法综述

专知会员服务

66+阅读 · 2020年9月24日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日