项目名称: 微博中定向话题发现与追踪
项目编号: No.61502447
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 晏小辉
作者单位: 中国科学院计算技术研究所
项目金额: 21万元
中文摘要: 如何从海量杂乱的微博数据中获取用户感兴趣的信息一直是个难题。本课题从微博数据的特点出发,研究如何从微博中根据用户输入的关键词自动发现和追踪相关话题和消息。该课题面临如下挑战:1)微博消息长度特别短,给现有话题发现和消息检索方法带来严重数据稀疏性问题;2)微博中噪音数据非常多;3)微博数据更新快,话题的内容随时间不断演化。本课题首先研究结合用户先验知识的短文本定向话题建模方法。在此基础上,继续研究其在线学习算法以满足即时话题追踪的需要。最后,我们进一步研究基于排序学习的话题相关消息检索方法。本课题的研究能提升我们对微博流式短文本建模和挖掘的水平,为网络舆情监控、商业情报分析等应用提供关键技术支持。因此,本课题具有重要的研究与应用价值。
中文关键词: 微博;话题模型;话题追踪;定向话题;舆情监控
英文摘要: How to get information of interest to users from the massive and messy microblog posts is a changing problem. Based on the characteristics of the data, we study how to automatically discover and track targeted topics, i.e., the topics relate to the keywords provided by users. There are three major challenges: 1) The microblog posts are extremely short, which causes severe data sparsity problem for existing topic discovery and short text retrieval methods; 2)The posts contain lots of noisy data; 3) The microblog data change fast, thus the content of the topics are also dynamically changing. In this project, we first study how to exploit the prior knowledge of users to guide the modeling of targeted topics over short texts. Then, we develop online algorithms for targeted topic tracking. Finally, we further study how to retrieve related posts for a topic using the learning-to-rank technique. This project can bring improvement in short text modeling and mining in microblog, and support many applications such as public opinion monitoring, business intelligence. Hence, this project has important value in both research and industry fields.
英文关键词: microblog;topic models;topic tracking;targeted topics;public opinion monitoring