项目名称: 基于实例迁移的文本情感分析领域适应问题研究
项目编号: No.61305090
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 夏睿
作者单位: 南京理工大学
项目金额: 23万元
中文摘要: 情感分析领域适应问题是近几年来自然语言处理领域的前沿问题和研究热点。在领域适应问题的研究中,存在"重标注迁移、轻实例迁移"的现象:标注迁移已经得到了广泛关注和深入研究;而实例迁移受制于概率比估计的难题,成为相对薄弱的环节和空白的地带。本项目针对上述状况,主要开展下列三方面的工作:1、针对实例迁移展开深入研究。将概率比估计问题转化为样本与分布的相似度计算问题,提出基于PU学习的源领域样本与目标领域相似度计算方法;2、基于该相似度,研究跨领域统计建模中的样本选择和权重采样方法,建立完善的基于实例迁移的领域适应模型;3、最后将问题扩展到多个源领域,探讨基于多源领域协同的情感分析领域适应方法。本项目预计在国内外重要学术期刊和顶级国际会议上发表论文不少于6篇。本项目的完成将有助于推动"大数据"背景下面向互联网海量和多源文本的情感分析方法的研究,具有重要的理论意义和应用价值。
中文关键词: 情感分析;领域适应;迁移学习;;
英文摘要: Domain adaptation in sentiment classficaition have been a frontier research direction in recent years in the field of natural language processing. "Instance adaptation" and "labeling adaptation" are two basic factors in the domain adaptation problem. "Labeling adaptation" has received wide attention and in-depth study. However, since density ratio is hard to estimate, "instance adaptation" has not been well studied in the literature. In this project, we are going to conduct research in three aspects: Firstly, we transform density ratio estimation into the problem of similarity measure between a sample and a distribution, and proposed a PU learning based approach to address this problem. Secondly, based on the similarities obtained by PU learning, we study the issues of sample selction and importance sampling for instance adaptation; Finally, we extend the task from one source to multiple source domains, and study correspoonding domain adaptation methods. The implementation of this project will help promote the research of cross-domain sentiment classification, which is a fundamental problem of text mining on the "big-data" Internet.
英文关键词: Sentiment Analysis;Domain Adaptation;Transfer Learning;;