最大量分组 (Max-Quantile Grouped Infinite-Arm Bandits)

In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinite-arm group whose reservoir distribution has the highest $(1-\alpha)$-quantile (e.g., median if $\alpha = \frac{1}{2}$), using as few total arm pulls as possible. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with our instance-dependent upper bounds.

翻译：在本文中, 我们考虑一个匪帮问题, 每一组都有数个匪帮, 每一组由无数武器组成。每当要求某一组新手臂时, 其平均奖赏来自未知的储油层分配( 每个组不同 ), 而该臂的平均奖赏的不确定性只能通过随后的手臂拉动来减少。我们的目标是确定储油层分配量最高( 1-\ alpha) $- quantile( 例如, 如果$alpha =\ frac {1\2}$, 中位值), 使用尽可能少的手臂拉动。我们引入了两步算法, 首先要求每个组各有固定数量的军火, 然后运行一个有限武器组最大量的盗匪队算法。我们既区分以实例为主的和最坏的遗憾, 并且为后者提供一个匹配更低的界限, 同时讨论各种强、弱点、算法改进和与我们以实例为主的上限相关的较低界限。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/