We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only pulling source arms. In this paper, we study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We show how our framework covers a range of previously studied pure exploration problems and additionally captures new problems. We propose and theoretically analyze an LUCB-style algorithm to identify an $\epsilon$-optimal target arm with high probability. Our theoretical analysis highlights aspects of this transfer learning problem that do not arise in the typical BAI setup, and yet recover the LUCB algorithm for single domain BAI as a special case.
翻译:我们考虑的是多武装匪徒中最佳武器识别(BAI)问题的一个变式,即有两套武器(来源和目标),目标是确定最佳目标武器,同时只拉出源武器。在本文中,我们研究的是尽管手段不明,但来源和目标武器识别(BAI)案例之间何时存在着已知的叠加关系。我们展示了我们的框架如何涵盖以前研究过的一系列纯勘探问题,并额外捕捉了新的问题。我们提议并理论上分析一种LUCB式算法,以便极有可能确定一个$-epsilon$-最佳目标武器。我们的理论分析强调了在典型BAI设置中并不出现的转让学习问题的各个方面,但作为一个特例,我们又恢复了LUCB用于单一域的BAI算法。