项目名称: 基于查询词级联关系的高阶信息检索问题研究
项目编号: No.61202181
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 乔亚男
作者单位: 西安交通大学
项目金额: 25万元
中文摘要: 传统信息检索系统的输入通常是一系列平行的查询词,只能较为粗糙地反映用户的信息需求。在实际应用环境中,用户给出的多个查询词之间往往存在着一定的层次关系,用户实际需求的文档不仅要包含特定的查询词,而且这些查询词在文档中的相对位置还需要满足特定的多重从属关系,即满足查询词级联关系。这种基于查询词间级联关系的信息检索问题称之为高阶信息检索问题,它在一定程度上统一了已有的一些研究方向,如舆情分析、时间链分析、发展趋势分析和文本情感分类等。本课题试图建立统一模型对高阶信息检索问题进行建模和分析,挖掘查询词间的深层次关系,以更通用的方式解决高阶信息检索问题。在该模型中,文档和查询都被抽象为文档张量和查询张量,文档和查询的匹配过程转化为文档张量和查询张量之间的相似度运算,可以更直接地处理在传统信息检索模型中本质上被简化为一阶的高阶信息检索问题。
中文关键词: 信息检索;社会网络分析;维基百科;微博;新词识别
英文摘要: The inputs of traditional Information Retrieval systems are always query terms with parallel relations, and this type of inputs reflect the users' information needs roughly. In the practical applications, there are some complex relations between query terms frequently. In fact, not only should the documents user needed contain the query terms, but also the relative positions of query terms should meet certain hiberarchy relations. This is "Higher-order Information Retrieval" defined in this proposal, and we call traditional Information Retrieval "first order Information Retrieval" instead. Some research fields such as Public Opinion Analysis, Chain of Events Analysis, Trend Analysis and Text Sentiment Classification reflect the vague concept of Higher-order Information Retrieval in previous studies. In this proposal, we try to propose a unified model to deal with Higher-order Information Retrieval problems, mining the inner meanings between query terms, resolving the Higher-order Information Retrieval Problems in more general ways. In this unified model, the documents and queries are converted to document tensors and query tensors, and the matchings between documents and queries are converted to the calculations of similarity function between document tensors and query tensors. This unified model is appropriate
英文关键词: Information Retrieval;Society Network Service Analysis;Wikipedia;Microblog;New Word Detection