This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}{\Delta} + \sqrt{ \frac{C \log N }{\Delta} } )$-regret, where $N, \Delta$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
翻译:本文探讨了两个基本的先后决策问题:用专家意见预测的问题和多臂匪盗问题。 我们侧重于对手可能腐败损失的随机制度,我们调查在对抗敌对腐败方面能够达到的稳健程度。 本文的主要贡献是表明最佳的稳健性可以表现为对腐败程度的平底依赖。 更确切地说, 我们展示了两类算法, 即随着学习率下降而随时与第二阶差错的学习率和算法相冲突, 达到O( \ frac) nunDelta} +\ sqrt{\ sqrt{ \ frac{ C\log N un Delta}} 和 $- regret, 分别代表专家人数、 差距参数和腐败程度。 我们还提供了相应的较低约束, 这就意味着这种遗憾的束缚将紧到一个不变的因素。 对于多臂匪盗问题, 我们还提供了近乎更窄的下限到一个对数系数。