The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations between the data. The ability to model complex dependency relationships between random variables has become increasingly popular in the realm of information retrieval, and the need to further explore these dependencies for data fusion has been recently acknowledged. Copulas provide a framework to separate the dependence structure from the margins. Inspired by the theory of copulas, we propose a new unsupervised, dynamic, nonlinear, rank fusion method, based on a nested composition of non-algebraic function pairs. The dependence structure of the model is tailored by leveraging query-document correlations on a per-query basis. We experimented with three topic sets over CLEF corpora fusing 3 and 6 retrieval systems, comparing our method against the CombMNZ technique and other nonlinear unsupervised strategies. The experiments show that our fusion approach improves performance under explicit conditions, providing insight about the circumstances under which linear fusion techniques have comparable performance to nonlinear methods.
翻译:信息检索中排位合并的目标是从多个搜索结果中提供单一的输出列表。 通过合并各种IR系统的产出来改进性能是一项具有挑战性的任务。一个中心点是,许多非明显因素都涉及到相关性的估计,从而导致数据之间的非线性相互关系。在信息检索领域,随机变量之间模拟复杂依赖关系的能力越来越受欢迎,而进一步探索数据聚合依赖关系的必要性最近已经得到承认。 Copulas提供了一个框架,可以将依赖结构与边际结构分开。在 Cogulas理论的启发下,我们提出一种新的不受监督的、动态的、非线性、级融合方法,其基础是非地理功能配对的嵌套构成。模型的依赖性结构通过在信息检索领域利用查询文件相关性来调整。我们试验了三个专题,即CLEF Corora Fus 3 和 6 检索系统,将我们的方法与CombMNZ技术和其他非线性非线性战略进行比较。根据Copultures理论,我们提出了一个新的不受监督的、动态、非线性、线性、级融合方法,在不透视环境下,实验显示我们可比较性观测方法,在不透视的精确环境中改进性方法。