项目名称: 网络信息的话题挖掘和分析关键技术研究
项目编号: No.60873097
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 生物科学
项目作者: 王挺
作者单位: 中国人民解放军国防科学技术大学
项目金额: 38万元
中文摘要: 对网络信息的话题内容进行智能处理,不仅具有重要的应用价值,而且在科学研究上也极具挑战性,是目前学术界研究的热点。针对网络信息在话题内容上的演变性、在传播方式上的流动性和社会性等特点,本项目把网络信息的话题挖掘和分析问题放在社会网络这一背景下进行,通过有机结合话题分析和社会网络分析这两方面的研究,以自然语言处理技术和机器学技术为基本手段,达到提高网络信息内容分析准确性的目标。本项目主要研究内容包括:多层次多特征话题信息自适应过滤技术;以事件为核心的话题描述框架,以及基于事件模型的话题发现和信息抽取技术;面向网络文本信息的社会关系挖掘和社会网络分析技术;在此基础上,以社会网络挖掘为基础,有机融合网页的结构特征、文本内容的语义特征、信息传播特征和社会关系网络特征等,实现多特征融合的特定话题信息流的跟踪,以揭示重要话题的传播和演化规律,提高互联网信息的话题挖掘和分析的准确性。
中文关键词: 话题发现和跟踪;社会网络分析;自然语言处理;信息抽取;信息过滤
英文摘要: The intelligent processing of the content of Web information has important application value as well as great challenge in scientific research, which is the focus of current research. Considering the characteristics of the web information, such as the evolution of the topic, the fluidity and sociality of the information spreading, this project put the topic mining and analyses of the web information against the background of the Social Networks, by the means of combining the topic analyses and Social Networks Analyses together to improve the accuracy of Web content processing, with the support of Natural Language Processing and Machine Learning technologies. The research of this project includes: adaptive multi-layers and multi-features topic information filtering; the event based topic description framework and event-model based topic detection and information extraction; social relationship mining and Social Networks Analyses technology based on Web text. Based on the above research, especially the Social Networks mining, this project makes use of various features, such as the structure features, the semantic features of the text content, the features of information spreading and the Social Networks features, to track the information stream for specific topics and discover the rule of its spreading and evolving, which will contribute to improve the accuracy of the topic mining and analyses of Web information.
英文关键词: Topic Detection and Tracking; Social Networks Analyses; Natural Language Processing; Information Extraction; Information Filtering