In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not only achieves nearly instance-optimal regret in stochastic environments, but also works in corrupted environments with additional regret being the amount of corruption, while the state-of-the-art (Li et al., 2019) achieves neither instance-optimality nor the optimal dependence on the corruption amount. Moreover, by equipping this algorithm with an adversarial component and carefully-designed testings, our second algorithm additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to our knowledge. Finally, all our guarantees hold with high probability, while existing instance-optimal guarantees only hold in expectation.
翻译:在这项工作中,我们开发了自动适应不同环境的线性土匪算法。 通过将新颖的损失估计器插进最优化战略特点的优化问题中,我们的第一种算法不仅在随机环境中取得了几乎最优的遗憾,而且还在腐败环境中工作,而更多的遗憾是腐败程度,而最先进的(Li等人,2019年)既不能达到最优的试想性,也不能对腐败程度产生最佳依赖性。此外,通过为这一算法配备一个对抗性构件和精心设计的测试,我们的第二种算法在完全敌对的环境中还享有最优的遗憾,而这正是我们所了解的首个。 最后,我们所有保证都极有可能维持,而现有的试最优保证只能维持在预期之中。