项目名称: 面向微博平台的短文本话题检测与跟踪研究
项目编号: No.61303115
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 李飞
作者单位: 武汉大学
项目金额: 23万元
中文摘要: 微博(Microblog)已成为信息发布、交流的热门平台,其信息的实时性与内容的丰富性均是传统平台所不具备的。基于微博这样一个汇集海量信息的平台开展话题检测与跟踪(Topic Detection and Tracking,TDT)工作,将会帮助人们及时掌握重要信息。然而其消息的短文本属性以及平台中话题相关信息的小样本特性,使得在其中开展TDT 工作较为困难。目前国内外在TDT 方面的研究大都限于长文本范围,较少涉及短文本环境。本课题针对微博这种新兴并飞速发展的社交网络平台上的海量信息,提出一种新的结合了短文本分析和用户特征分析的文本融合技术框架,采用文本相似度计算、LDA 话题挖掘和基于拟合的句子排序等具体技术,对微博话题进行动态的检测、跟踪和分析,并以图示和列表等用户易于理解的方式,返回话题关键信息。帮助用户把握全局性的事件背景,并提出趋势发展的预测结果,为决策者提供高质量的决策支持。
中文关键词: 微博;短文本;话题检测;话题跟踪;高质量微博
英文摘要: Microblog has become a popular platform for information exchange, which has much more real-time information than the traditional platforms. It will help people to grasp the important information that carrying out the TDT work on the microblog platform. However, its short message text attributes as well as the small sample properties of the platform in topic-related information, making more difficult to carry out TDT work. Current research in TDT is mostly limited to long text, less involved short text environment.This project focuses on the vast amounts of information on this emerging and rapid development of the social networking platform. Proposing a new text integration technology framework which is a combination of short text analysis and user characteristics. Text similarity computation, the LDA topic mining and fitting sentence sorting technology has been used to track and analysis the topic of microblog dynamically. The topic of critical information has been returned by icons and lists which can be understood easily. Such framework can help users to grasp the background of the global event, and put forward the trend predictions for policy-makers to provide high-quality decision support.
英文关键词: Microblog feed streams;topic detection;topic tracking;topic abstraction;high quality microblog