Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users. However, the required exploration drives it away from successful practices in offline learning to rank, which limits OL2R's empirical performance and practical applicability. In this work, we propose to estimate a pairwise learning to rank model online. In each round, candidate documents are partitioned and ranked according to the model's confidence on the estimated pairwise rank order, and exploration is only performed on the uncertain pairs of documents, i.e., \emph{divide-and-conquer}. Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution's theoretical convergence with its expected ranking performance. Comparisons against an extensive list of OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.
翻译:在线学习排名( OL2R) 通过直接优化排行员与用户的互动,消除了明确的关联性说明的必要性,然而,要求的勘探将它从离线学习的成功做法推到排行,这限制了OL2R的经验性能和实用性。在这项工作中,我们提议估算一种双向学习以在线排名模式。在每一轮中,候选文件按照模型对对齐估计排名顺序的信心进行分割和排序,并且只对不确定的两对文档进行勘探,即:\emph{divide- and- conquer}。对定型不当配对数量的直接定义得到了证明,这就将在线解决方案的理论趋同与其预期排名绩效联系起来。与关于两次公众学习对基准数据集进行排序的OL2R基线的广泛清单相比,显示了拟议解决方案的有效性。