项目名称: 基于数据重构的社会突发事件文摘研究
项目编号: No.61472277
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 计算机科学学科
项目作者: 贺瑞芳
作者单位: 天津大学
项目金额: 82万元
中文摘要: 社会媒体作为主导的通讯手段,非常规突发事件频发使得研究社会媒体下的高效信息获取刻不容缓。本课题正是以危机事件应急响应与应急决策为应用前景,以社会媒体产生的特定危机事件话题相关的微博集作为研究对象。探索基于数据重构的社会突发事件文摘加速算法新思路,以满足文摘内容选择的重要性、可信度、新颖性以及覆盖性。抓住社会媒体的新挑战:1.简短、口语化、无结构性;2.社交、信任性;3.时序冗余性。由于传统文摘方法无法适应这些新挑战,为此探索从压缩感知、数据重构角度,借助稀疏学习及社会学相关研究成果,将新挑战建模到稀疏优化模型中,提出:1.基于小波分析时间窗自适应的重要时间点选择;2.基于组结构化稀疏学习联合的内容选择;3.融合可信度建模的优化内容选择;4.时序演化性导向的Sparse Fused Group Lasso内容选择。由此发展社会突发事件文摘内容选择的加速机器学习框架,有着重要的研究和应用价值。
中文关键词: 微博文摘;社会媒体;信息抽取;文本挖掘
英文摘要: High frequent crisis makes it urgent to study how to acquire useful information from social media efficiently, which is considered to be a dominant communication way. Under the background of crisis response and decision support, we choose crisis oriented topic-specific microblog collection from social media as the input, and explore how to acquire high important, credibile, less redundant and high coverage summarization for social crisis in a fast speed way. The new challenges in social media are grasped, including (1) short, informal and unstructured; (2) social and credible; (3) temporal redundant. We study automatic summarization from the perspective of compressive sensing and data reconstruction, and build the new challenges into the optimization model by using sparse learning and the harvest from sociology, since traditional methods could not handle those challenges. We mainly propose (1) wavelet analysis based important time point detection with the adaptive time window; (2) group sparse leaning based joint content selection; (3) strengthed content selection with the credibility modeling; and (4) temporal dynamics guided content selection based on sparse fused group lasso. Therefore, developing new machine learning methods about content selection for social crisis summarization has the significant research and application value.
英文关键词: Microblog Summarization;Social Media;Information Extraction;text mining