Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination strategies are often not fully adaptive (they update their sampling rule infrequently) and are not easy to extend to combinatorial settings, where the set of answers is exponentially large in the problem dimension. On the other hand, most existing fully-adaptive strategies to tackle general identification problems are computationally demanding since they repeatedly test the correctness of every answer, without ever reducing the problem size. We show that adaptive methods can be modified to use elimination in both their stopping and sampling rules, hence obtaining the best of these two worlds: the algorithms (1) remain fully adaptive, (2) suffer a sample complexity that is never worse of their non-elimination counterpart, and (3) provably eliminate certain wrong answers early. We confirm these benefits experimentally, where elimination improves significantly the computational complexity of adaptive methods on common tasks like best-arm identification in linear bandits.
翻译:然而,现有的消除战略往往不完全适应(它们不经常更新取样规则),而且不容易推广到组合环境,因为一组答案在问题层面是巨大的。 另一方面,大多数现有的解决一般识别问题的完全适应性战略在计算上要求很高,因为它们反复测试每个答案的正确性,而从未减少问题大小。 我们表明,适应方法可以修改,以便在其拦截和取样规则中采用消除方法,从而获得这两个世界中的最佳方法:(1) 算法仍然完全适应,(2) 其抽样复杂性永远不比非消灭对应方更差,(3) 早期消除某些错误答案。 我们实验性地确认,消除这些好处,因为消除这些好处极大地提高了在诸如线性土匪中最佳武器识别等共同任务上适应方法的计算复杂性。