We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide $\textit{a problem-dependent lower bound on the regret}$ of any corrupted bandit algorithm. The lower bounds indicate that the corrupted bandit problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that HubUCB achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design SeqHubUCB that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
翻译:自然损坏的赌博机问题:遗憾的下限和鲁棒乐观算法
我们研究了被自然损坏的赌博机问题,即具有$k$个未知奖励分布的随机多臂赌博机问题,这些奖励分布呈重尾分布。奖励由相应的重尾奖励分布以$1-\varepsilon\in(0.5,1]$的概率和任意的污染分布以$\varepsilon\in[0,0.5)$的概率给出。首先,我们提供了任何损坏赌博机算法的遗憾下限。这些下界表明,损坏赌博机问题比具有亚高斯或重尾奖励的经典随机赌博机问题更难。接下来,我们提出了一种新型的损坏赌博机UCB算法,即HubUCB,它建立在Huber的鲁棒均值估计上。利用Huber估计的新的集中不等式,我们证明了HubUCB能够获得几乎最优的遗憾上限。由于计算Huber估计具有二次复杂度,我们进一步介绍了一个连续的版本,该版本具有线性的复杂度。我们利用这个连续的估计来设计SeqHubUCB,它具有类似的遗憾保证,同时减少了计算负担。最后,我们实验室演示了HubUCB和SeqHubUCB在解决不同奖励分布和不同污染水平的损坏赌博机方面的效率。