SPRT 以SPRT为基础、高效率、高效、最佳武器识别坑式强盗中的最佳武器识别方法 (SPRT-based Efficient Best Arm Identification in Stochastic Bandits)

This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The state-of-the-art algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, a novel framework is proposed, which views the BAI problem as sequential hypothesis testing, and is amenable to tractable analysis for the exponential family of bandits. Based on this framework, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests. This algorithm has three features for both settings: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta-$PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, these approaches, which are focused only on the Gaussian setting, require Thompson sampling from the arm that is deemed the best and a challenger arm. This paper analytically shows that identifying the challenger is computationally expensive and that the proposed algorithm circumvents it. Finally, numerical experiments are provided to support the analysis.

翻译：本文调查了固定信心环境中多武装盗匪中最好的手臂识别(BAI)问题。考虑了强盗成倍家族的总类别。强盗成倍家族最先进的算法面临计算挑战。为了减轻这些挑战,提出了一个新的框架,将BAI问题视为顺序假设测试,并便于对强盗成倍家族进行可移植分析。基于这个框架, BAI算法设计了一种能够利用罐头序列概率比测试的BAI算法。这个算法对两种环境都有三种特征:(1) 其样本复杂性是微不足道的,(2) 保证是美元=delta-$PAC, 和(3) 处理最先进方法的计算挑战。具体地说,这些方法只侧重于高斯环境,要求从被视为最佳和挑战手的手臂中采集汤普森样本。本文分析显示, 确定挑战者是计算成本昂贵的, 并且拟议的算法绕过它。最后, 提供数字实验是为了支持分析。