The recent literature on online learning to rank (LTR) has established the utility of prior knowledge to Bayesian ranking bandit algorithms. However, a major limitation of existing work is the requirement for the prior used by the algorithm to match the true prior. In this paper, we propose and analyze adaptive algorithms that address this issue and additionally extend these results to the linear and generalized linear models. We also consider scalar relevance feedback on top of click feedback. Moreover, we demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.
翻译:最近关于在线学习排名的文献(LTR)已经确立了Bayesian等级强盗算法先前知识的效用。 但是,现有工作的一个主要限制是,对算法先前使用的算法要求与先前的真数相符。 在本文中,我们提出和分析解决这一问题的适应性算法,并将这些结果扩大到线性和普遍性线性模型。我们还考虑在点击反馈后提供与尺度相关的反馈。此外,我们用合成和现实世界实验来展示我们的算法的功效。