In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.
翻译:在固定预算阈值土匪问题中,一种算法将预算内数量的样本按顺序分配给不同分布的样本,然后预测每种分配的平均值是否大于或低于某一阈值。我们引入了由弗兰克-沃夫算法启发的一大批算法(包含大多数现有的相关算法),并提供了对其表现的彻底而通用的分析。这使我们能够为一系列广泛的问题制定新的明确算法,这些问题的损失在非适应性或非适应性或非适应性之间一个小的不变因素中。非常有趣的是,我们发现适应性方法在经验上大大超出不适应性或非适应性手法,这是标准在线学习环境中的一种罕见的行为,例如尽量减少遗憾。我们用洞察力的微小问题来解释这个惊人的现象。