项目名称: 面向社区的协同检索方法研究
项目编号: No.61202286
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 刘永利
作者单位: 河南理工大学
项目金额: 23万元
中文摘要: 随着社会化网络的飞速发展,协同检索成为信息检索领域的研究热点,它对于提高社区内用户检索的准确率和效率具有重要的实际意义。社区不断变化,因此需要对社区信息持续更新,但检索过程的特点为此造成了很大困难,包括:①数据稀疏;②特征空间维度高;③数据更新频繁。本课题围绕以上三个特点展开研究,内容包括:⑴三维空间相关性模型:建立由用户、查询和文档构成的三维空间,并采用概率方法量化三个维度间的相关性;⑵基于联合聚类的社区动态确定方法:针对检索过程的特点①和②,将原本仅用于分析二维列联表的信息论联合聚类方法进行扩展,使之适用于分析三维问题,进而动态确定用户社区;⑶增量学习机制:针对检索过程的特点③,从数据所在维入手,增量更新三维概率关系及联合聚类结果。本课题基于三维空间的概率关系,重点围绕社区的动态更新问题,兼顾理论分析和实践验证,为协同检索方法的进一步研究与应用提供新的思路。
中文关键词: 信息瓶颈;信息检索;模糊聚类;联合聚类;增量聚类
英文摘要: With the rapid development of social network, collaborative retrieval has become a research focus in Information Retrieval field, which has important practical significance to raising the accuracy and efficiency of Web search in communities. A community is constantly changing, which makes it obligatory to identify communities in dynamic networks, but such characteristics of search process will become bottlenecks, as ①sparsity, ②high-dimensionality and ③dynamic data. This topic mainly includes following three parts. ⑴Probability methods are employed to express the relevance amongst user, query and document. ⑵When dealing sparse and high-dimensional data, we innovatively extend information-theoretic co-clustering methods originally just used to analyze two-dimensional contingency tables to be is suitable for three-dimensional data. ⑶For dynamic data, we start with the dimensionality that needs to be updated, and incrementally renew the three-dimensional relevance and co-clustering results, as can improve the update efficiency. We also analyze and discuss the recommendation method based on collaboration and real-time property. Our topic is based on probabilistic relevance of the three-dimensional space, focuses on identifying communities dynamically, takes into account both theoretical analysis and experimental ver
英文关键词: information bottleneck;information retrieval;fuzzy clustering;co-clustering;incremental clustering