项目名称: 面向互联网舆情分析的文档自动摘要关键技术研究
项目编号: No.60873155
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 轻工业、手工业
项目作者: 万小军
作者单位: 北京大学
项目金额: 29万元
中文摘要: 文档自动摘要的目的在于对文本内容进行提炼和总结,方便用户快速获取信息。互联网舆情内容包括新闻、博客、评论等,具有海量、动态演化、多语言、情感相关等特性,这些特性给传统的文档自动摘要技术带来了很大的挑战。本项目首先研究了文档自动摘要相关技术,进而深入研究了动态演化式摘要、跨语言摘要、比较式摘要、情感分析与观点抽取等新技术。本项目在若干关键技术上取得了学术突破,基于项目研究成果及相关成果共发表高水平学术论文24篇,其中14篇发表在领域顶级国际期刊Computational Linguistics、ACM Transactions on Information Systems与顶级国际会议ACL、SIGIR、IJCAI、COLING、EMNLP、ICDM上。参加相关领域多项国际权威评测均取得第一名的优异成绩。申请国家发明专利7项,部分技术成功应用于互联网舆情分析系统。
中文关键词: 文档自动摘要;跨语言摘要;动态演化摘要;比较式摘要;情感分析与观点抽取
英文摘要: Automatic document summarization aims to refine and summarize the major content in texts, thus facilitating users to quickly acquire useful information. Web documents include news articles, blogs, comments, etc. Different from traditional document summarization, the summarization task for Web documents is very challenging because of its massive, evolutionary, multi-lingual and sentiment-related characteristics. In this project, we firstly investigated related techniques of document summarization, and then investigated several new techniques, including evolutionary summarization, cross-lingual summarization, comparative summarization, sentiment analysis and opinion extraction, etc. Based on the academic breakthroughs we achieved in this project, twenty-four high-quality papers have been published, and fourteen papers were published on leading international journals (Computational Linguistics and ACM Transactions on Information Systems) and leading international conferences (ACL, SIGIR, IJCAI, COLING, EMNLP and ICDM).We participated in several leading international evaluations and ranked first with best performance. Seven patent applications have been filed. Several techniques have been applied to the real system of Internet public opinion analysis.
英文关键词: Document Summarization; Cross-Lingual Summarization; Evolutionary Summarization; Comparative Summarization; Sentiment Analysis and Opinion Extraction