学习 Truncate 信息检索排名列表 (Learning to Truncate Ranked Lists for Information Retrieval)

Ranked list truncation is of critical importance in a variety of professional information retrieval applications such as patent search or legal search. The goal is to dynamically determine the number of returned documents according to some user-defined objectives, in order to reach a balance between the overall utility of the results and user efforts. Existing methods formulate this task as a sequential decision problem and take some pre-defined loss as a proxy objective, which suffers from the limitation of local decision and non-direct optimization. In this work, we propose a global decision based truncation model named AttnCut, which directly optimizes user-defined objectives for the ranked list truncation. Specifically, we take the successful transformer architecture to capture the global dependency within the ranked list for truncation decision, and employ the reward augmented maximum likelihood (RAML) for direct optimization. We consider two types of user-defined objectives which are of practical usage. One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search. Empirical results over the Robust04 and MQ2007 datasets demonstrate the effectiveness of our approach as compared with the state-of-the-art baselines.

翻译：在诸如专利搜索或法律搜索等各种专业信息检索应用中,排名排减列表的排减对于专利搜索或法律搜索等各种专业信息检索应用至关重要。目标是根据一些用户定义的目标,动态地确定返回文件的数量,以便在结果和用户努力的总体效用之间达成平衡。现有方法将这项任务作为一个顺序决定问题,并将一些预先界定的损失作为一个代理目标,这受到当地决定和非直接优化的限制。在这项工作中,我们提议了一个全球决定排减模式,名为AttnCut,直接优化排名列表脱脱钩的用户定义目标。具体地说,我们采用成功的变压器结构,在排减决定的排名列表中捕捉全球依赖性,并使用奖励办法提高直接优化的最大可能性。我们考虑两种用户定义的目标,即实际用途。一种是广泛采用的指标,如F1,它起到平衡目标的作用,另一种是某种最低回收限制下的最佳F1,它代表专业搜索的典型目标。在Robust-04和MQSet数据基点上,将我们的数据基点与我们的基准进行比较,从而显示我们的数据基点的基线。