We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its parameters. When solving a bandit task, that uncertainty is marginalized out and properly accounted for. ${\tt AdaTS}$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems. We derive upper bounds on its Bayes regret that quantify the loss due to not knowing the task prior, and show that it is small. Our theory is supported by experiments, where ${\tt AdaTS}$ outperforms prior algorithms and works well even in challenging real-world problems.
翻译:我们提出美元,这是汤普森抽样算法,可以按顺序适应它与之互动的土匪任务。 关键的想法是,通过维持对参数的分布来适应一个未知的任务先前的分配。 当解决土匪任务时,这种不确定性被排挤出来,并适当说明原因。 美元是完全Bayesian算法,可以在几类土匪问题中有效实施。 我们从贝斯获得上界的遗憾,因为它不知道之前的任务,而量化了损失,并表明它是很小的。 我们的理论得到了实验的支持,在实验中,美元优于先前的算法,甚至在挑战现实世界的问题中也运作良好。