项目名称: 基于语义理解的面向特定主题的微博舆情监控技术研究
项目编号: No.61303190
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 李莎莎
作者单位: 中国人民解放军国防科学技术大学
项目金额: 23万元
中文摘要: 网络舆情监控是当前政府和研究领域最关心的问题之一,而微博又是舆情监控的重点和难点。本项目以目前主流的中文微博为研究对象和平台,针对微博内容产生及传播迅速、非正规化、信息稀疏等特点,对面向特定主题的微博舆情监控的关键技术进行研究,通过对主题和微博文本的充分的语义理解,增强面向特定主题的舆情感知和追踪的实时性和准确性。首先,针对微博数据非正规化导致传统自然语言处理技术难以应用的特点,研究消除微博数据噪音的正规化方法;其次,针对微博数据语言个性化、多样化以及多变性所导致的已有本体库不适用问题,研究适用于微博数据的微博知识库的建立与更新;然后,针对微博文本篇幅短小所造成的信息稀疏问题,提出微博数据的语义化表示方法;接着,针对微博数据所具有的高速数据流特点以及微博舆情监控的实时性要求,研究基于语义搜索的高效流数据处理算法;最后,建立可交互监控机制,通过监控过程中的人机交互,实现监控模型的不断完善。
中文关键词: 微博;舆情热点预测与追踪;语义理解;词向量;深度学习
英文摘要: Network public opinion monitoring is one of the most concerned problems of governments and researchers. Forthermore, microblog is the most important and difficult application to monitor, since its content grows fast and is not regular or dense. In this project, we are going to research key technology to monitor microblog for subject oriented, to increaing the performance of real-time-ability and accuracy. Firstly, to avoid the the difficulty of applying traditinal method to the nonregular mcroblog data, we will research a method to reduce the noise from microblog data.Secondly,we will research how to build a microblog database related to the interest of users that is suitable for mcroblog data, and propose a formal expression method of mcroblog data. Thirdly,in the face of high traffic of microblog data and the real-time requiement when monitoring public sentiment of microblog, we will propose an effcient method to deal with microblog data. Fouthly, we will build an inter-active monitoring mechnisim to make the microblog database learn itself by exchanging with human and machine. This would continuly increase the accuracy of monitoring when using the monitor systems.
英文关键词: microblog;public opinion tracking;semantic understanding;word vector;deep learning