Two task-specific dependency-based word embedding methods are proposed for text classification in this work. In contrast with universal word embedding methods that work for generic tasks, we design task-specific word embedding methods to offer better performance in a specific task. Our methods follow the PPMI matrix factorization framework and derive word contexts from the dependency parse tree. The first one, called the dependency-based word embedding (DWE), chooses keywords and neighbor words of a target word in the dependency parse tree as contexts to build the word-context matrix. The second method, named class-enhanced dependency-based word embedding (CEDWE), learns from word-context as well as word-class co-occurrence statistics. DWE and CEDWE are evaluated on popular text classification datasets to demonstrate their effectiveness. It is shown by experimental results they outperform several state-of-the-art word embedding methods.
翻译:在这项工作中,为文本分类提出了两种基于特定任务的基于依赖的字嵌入方法。与通用的嵌入方法相比,我们设计了用于通用任务的基于特定任务的字嵌入方法,以便在具体任务中提供更好的业绩。我们的方法遵循了PPMI矩阵要素化框架,并从依赖性分析树中得出文字背景。第一个方法称为基于依赖的嵌入词(DWE),选择了依赖性剖析树中一个目标字的关键字和相邻字词,作为构建文字矩阵的背景。第二种方法,即称为类强化基于依赖的字嵌入方法(CEDWE),从文字和单级共生统计数据中学习。DWE和CEDWE,用流行的文本分类数据集进行评估,以证明其有效性。实验结果显示,它们超越了几个最先进的词嵌入方法。