项目名称: 基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究
项目编号: No.61502259
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 计算机科学学科
项目作者: 鹿文鹏
作者单位: 齐鲁工业大学
项目金额: 20万元
中文摘要: 词义消歧是自然语言处理研究的关键基础问题。图模型因其可有效表达词义概念之间的语义关联关系,可将消歧问题转化为词义结点的重要度评价问题,具有良好的消歧性能,近年来倍受关注。但是,图模型词义消歧方法在关联边权重设定、结点重要度评价和领域适应机制等方面依然面临困难和挑战。本项目将针对这些难点,研究图模型词义消歧及领域适应方法;重点研究基于非独立同分布学习理论的词义相似度计算方法,摒弃传统方法对语义属性的独立性假设,分析语义属性的耦合关系,以准确地评估图模型关联边的权重;同时,对比研究各种图模型评价策略,提出优化的结点重要度评价机制,突破图模型对PageRank算法的过度依赖;研究图模型领域适应机制,挖掘文档、篇章、词义领域知识构建并调整图模型,提高其领域消歧能力。本项目将形成一套完善的图模型词义消歧及领域适应方法,对机器翻译、信息检索等相关研究工作将起到有力的推动作用。
中文关键词: 词义消歧;词义相似度;非独立同分布学习理论;领域适应
英文摘要: Word sense disambiguation (WSD) is a key foundational issue in natural language processing. Graph model can effectively express semantic relations among sense concepts and can covert WSD to the evaluation of sense node importance, whose performance is better than the others. Graph model has received much attention in recent years. However, graph-based WSD still faces some difficulties and challenges in the setting of related edge weight, evaluation of node importance and domain adaptation. Aiming at the difficulties, this project will research graph-based WSD and its domain adaptation. We will focus on the computation of sense similarity based on non-IIDness (not independent and identically distributed) learning theory, which would abandon the independence assumption of semantic attributes, analyze coupled relations among them and exactly evaluate the weight of related edges in graph model. At the same time, we will compare all kinds of evaluation strategies of graph model and propose an optimized method to evaluate node importance, which would break through the over-reliance for PageRank algorithm in graph-based WSD. Besides, we will research domain adaptation of graph-based WSD, build and adjust graph model with domain knowledge of document, discourse and sense, which would improve its disambiguation ability on special domain. This project will propose a set of perfect methods of graph-based WSD and its domain adaptation, which would promote the developments of related works, such as machine translation and information retrieval, et al.
英文关键词: Word Sense Disambiguation;Sense Similarity;Non-IIDness Learning Theory;Domain Adaptation