项目名称: 信息多样性和信息摘要的关键问题研究
项目编号: No.61272227
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 黄民烈
作者单位: 清华大学
项目金额: 82万元
中文摘要: 如何保证信息的多样性是许多信息处理问题中的共性问题,广泛地存在于信息检索、文档摘要、自动问答、推荐系统、信息网络挖掘等任务中。本课题旨在解决信息多样性中的两个关键科学问题:(1)信息多样性的基本描述单位和度量方法,即什么样内容具有信息多样性以及多样性的程度如何;(2)给定信息需求,如何获得满足信息多样性要求的信息内容摘要,以最大程度地满足所有用户。我们的总体目标是提出描述信息多样性的表示与度量方法,建立统一计算框架使之产生满足多样性要求的信息内容。在这个框架中,不同粒度的信息被统称为"信息单元",用户需求和信息单元通过子话题空间来描述,信息摘要提供多样化的、结构良好的,多粒度和多模态的内容。为此,我们将研究信息多样性的表示和度量方法;研究信息摘要的组织结构及其抽取方法;建立适用于网络信息处理的考虑信息多样性的摘要算法和理论;研究如何根据信息需求的不同,选择信息摘要的不同表现粒度和不同模态。
中文关键词: 信息多样性;摘要;推荐;表示和度量;聚类
英文摘要: Many information processing tasks share a common problem that how to generate diverfied results. This problem has constantly been observed in many research tasks such as information retrieval, document summarization, automatic question answering, social recommender systems, information network mining, and so on. This proposal targets at two key issues of the problem: First, what is the representation unit of information diversity, and how to quantify diversity; Second, given an information need, how can we generate a diversified content that can maximally satisfy the average user. Our goal is to propose a method of representing and measuring diversity, and to established a unified framework to be able to generate information content (what we called information summary) that are diverse and relevant. In this framework, different granular information such as document, sentence, or aspect, is termed information unit. Information need, which may be represented by a set of keywords, a natural language question, or a task description, and information unit can be described via a subtopic space. The generated information summary will provide diversified, well-structured content that may consist of various units of different granularities including document, sentence, or aspect, and may have information of different moda
英文关键词: information diversity;summarization;recommendation;representation;clustering