概率序列序列递减:腐败的斯托卡强盗的最佳武器识别等级 (Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions)

We consider a best arm identification (BAI) problem for stochastic bandits with adversarial corruptions in the fixed-budget setting of T steps. We design a novel randomized algorithm, Probabilistic Sequential Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions. When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with probability tending to $1$ as $T\rightarrow \infty$. Otherwise, the optimality gap of the identified item degrades gracefully with the CPS.We argue that such a bifurcation is necessary. In PSS($u$), the parameter $u$ serves to balance between the optimality gap and success probability. The injection of randomization is shown to be essential to mitigate the impact of corruptions. To demonstrate this, we design two attack strategies that are applicable to any algorithm. We apply one of them to a deterministic analogue of PSS($u$) known as Successive Halving (SH) by Karnin et al. (2013). The attack strategy results in a high failure probability for SH, but PSS($u$) remains robust. In the absence of corruptions, PSS($2$)'s performance guarantee matches SH's. We show that when the CPS is sufficiently large, no algorithm can achieve a BAI probability tending to $1$ as $T\rightarrow \infty$. Numerical experiments corroborate our theoretical findings.

翻译：我们认为,对于在固定预算设置的T级步骤中存在对抗性腐败的流氓来说,最好的手臂识别(BAI)问题(BAI)是最好的手臂识别(BAI)问题。我们设计了一种新的随机算法,即概率序列递减(美元)(PSS(美元)(美元)(美元)),这是腐败程度的不可知度。当每步(CPS)的腐败程度低于阈值时,PSS(美元)(美元)确定最好的手臂或物品,其概率为$(美元/美元/美元)。否则,所查明的项目的最佳性差会优于CPS(美元) 。我们争论说,这种理算法是有必要的。在PSS(美元)中,参数值能平衡最佳性差距和成功概率。随机化的注入对减轻腐败的影响至关重要。为了证明这一点,我们设计两种攻击策略可以适用于任何算法。我们将其中一种策略应用于PSS(美元)的确定性比值比值与CPS(美元) 优度差。我们说这种比值是必要的。在SHAL(S) 的概率上,SHAL(美元) 的概率比值(美元) 的概率为C) 的概率是C(美元) 的概率为:SHHAL) 的概率(美元(美元) (美元) (美元) (美元) (美元) 的比值) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) 的概率的概率的概率为:SHI) (美元) (美元) 的概率是高的比值是,我们的概率(美元) 。