项目名称: 新闻话题线索与主题的探测研究
项目编号: No.60873134
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 生物科学
项目作者: 李芳
作者单位: 上海交通大学
项目金额: 30万元
中文摘要: 新闻话题在互联网时代有着强大的影响力。该项研究对新闻报道实现自动处理和组织。通过揭示文档中的语义信息,反映新闻话题的广泛性和动态性,帮助用户获取信息。该研究分为三个方面,第一个方面研究新闻报道事件的表示以及事件的关联模型。提出了基于事件词的新闻报道关联以及基于句子的因果关系探测方法。第二个方面研究采用LDA话题模型抽取特定事件新闻报道的语义信息,提出了线索词判断准则、线索抽取算法以及基于事件背景词的多元话题模型。第三个方面研究新闻话题的演化,提出了一种基于话题关联的话题演化方法,反映话题之间多对多的关系以及新话题产生和旧话题消亡。话题演化技术不仅应用在事件新闻报道,而且也应用到学术领域,研究了科技领域的热点话题、有影响力话题和它们的发展趋势。本项研究已发表论文21篇,录用/送审论文4篇,培养硕士研究生12名,开展国际合作,受邀欧盟LCT ERASMUS MUNDUS项目并举行专题讲座和学术交流。实现新闻报道自动聚类、国内外新闻专题自动分析以及近5年两会报道分析系统。已提交一项软件著作权申请,拟申请一项专利。该研究可以应用在专题新闻自动组织、图书馆科技文献和专利文献的自动分析。
中文关键词: LDA应用;新闻话题探测;话题演化;新闻线索与主题抽取
英文摘要: News topics have great impacts on the Internet. This research focuses on the news report in order to realize automatic processing and organization. It helps user to know what's going on in the world by unveiling the semantic meaning from news reports. There are three aspects in our research. The first aspect is about the representation of news events and the model of events relationship. A method of news reports tracking based on event words and a method for causal relation recognition are proposed. The second aspect is about the extraction algorithm for news thread. Thematic words are first extracted based on LDA model, then phrases based on the thematic words are iterative generated. We also proposed a topic n-gram model with a background distribution for news reports. The third aspect is research on topic evolution. A method of topic evolution based on related topics is proposed. It can detect many kinds of relationships among topics of any two consecutive time periods, such as one to many, many to many relationships. The method of topic evolution has applied not only on news reports but also on academic conference papers. The method can find hot topics and their changes of contents and strength with the time. We have published 21 papers on Journals and conferences. 2 papers are accepted and will be published in the future. 12 graduate students are funded by this research. The principle investigator has visited the Saarland university of Germany and DFKI as a scholar of LCT Erasmus Mundus Programme. Three systems have implemented based on the research results. We have applied a certificate of software ownership for news threading extraction. We plan to apply a patent for the whole system in the future. This research can be applied in many applications such as browsing news events, analyzing academic literatures and patents documents and so on.
英文关键词: LDA application; news topic detection and tracking; topic evolution; news thread extraction