Multiple-play bandits aim at displaying relevant items at relevant positions on a web page. We introduce a new bandit-based algorithm, PB-MHB, for online recommender systems which uses the Thompson sampling framework. This algorithm handles a display setting governed by the position-based model. Our sampling method does not require as input the probability of a user to look at a given position in the web page which is, in practice, very difficult to obtain. Experiments on simulated and real datasets show that our method, with fewer prior information, deliver better recommendations than state-of-the-art algorithms.
翻译:多剧强盗的目的是在网页的相关位置显示相关项目。 我们为使用汤普森抽样框架的在线推荐人系统引入了一种新的以盗匪为基础的算法(PB-MHB), 这个算法处理由基于位置的模型制约的显示环境。 我们的取样方法并不要求用户在网页上查看某个位置的概率,而实际上,该位置很难获得。 模拟和真实数据集实验显示,我们使用比最新算法更好的方法,而先前的信息较少。