We study the problem of best-arm identification (BAI) in contextual bandits in the fixed-budget setting. We propose a general successive elimination algorithm that proceeds in stages and eliminates a fixed fraction of suboptimal arms in each stage. This design takes advantage of the strengths of static and adaptive allocations. We analyze the algorithm in linear models and obtain a better error bound than prior work. We also apply it to generalized linear models (GLMs) and bound its error. This is the first BAI algorithm for GLMs in the fixed-budget setting. Our extensive numerical experiments show that our algorithm outperforms the state of art.
翻译:我们研究了在固定预算环境中背景强盗中的最佳武器识别(BAI)问题,我们建议采用一个总体连续消除算法,分阶段进行,在每个阶段消除一定的次优武器部分,这种设计利用静态和适应性分配的优势,我们用线性模型分析算法,并获得比先前工作更好的错误。我们还将它应用到通用线性模型(GLMs)中,并约束它的错误。这是在固定预算环境中对GLMs的第一个BAI算法。我们广泛的数字实验显示,我们的算法优于现代水平。