In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.
翻译:在本文中,我们引入了一个多武装强盗问题,称为最大分队强盗,在其中,武器被安排在可能重叠的团体中,目标是找到一个其最坏的手臂得到最高平均报酬的群体。 这个问题在建议系统等应用中引起兴趣,也与广泛研究的强大优化问题密切相关。 我们提出两个基于连续消除和强力优化的算法,并对样本数量进行上限,以保证找到一个最优化或接近最佳的团体,以及一个依赖低约束的算法。 我们讨论了不同利益案例中我们界限的紧紧程度,以及难以得出一致的紧紧界限。