We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them. While all previous works assume known and homogeneous curvature on these loss functions, we study a heterogeneous setting where each function has its own curvature that is only revealed after the learner makes a decision. We develop an efficient algorithm that is able to adapt to the curvature on the fly. Specifically, our algorithm not only recovers or \emph{even improves} existing results for several homogeneous settings, but also leads to surprising results for some heterogeneous settings -- for example, while Hazan and Levy (2014) showed that $\widetilde{O}(d^{3/2}\sqrt{T})$ regret is achievable for a sequence of $T$ smooth and strongly convex $d$-dimensional functions, our algorithm reveals that the same is achievable even if $T^{3/4}$ of them are not strongly convex, and sometimes even if a constant fraction of them are not strongly convex. Our approach is inspired by the framework of Bartlett et al. (2007) who studied a similar heterogeneous setting but with stronger gradient feedback. Extending their framework to the bandit feedback setting requires novel ideas such as lifting the feasible domain and using a logarithmically homogeneous self-concordant barrier regularizer.
翻译:我们考虑了对抗性土匪曲线优化的问题, 也就是说, 在线学习一系列任意的 convex 损失函数, 每一个功能都只有一种功能评估。 虽然所有先前的工程都对这些损失函数假设已知的和同质的曲线, 我们研究每个函数都有其自己的曲线的多样化设置, 只有在学习者做出决定后才能显示这一点。 我们开发了一种有效的算法, 能够适应苍蝇的曲线。 具体地说, 我们的算法不仅恢复或改进了几个单一环境的现有结果, 而且还导致某些不同环境的惊人结果 -- -- 例如, Hasan 和 Levy (2014) 显示, $\ 宽度{O} (d ⁇ 3/2 ⁇ sqrt{T}), 每个函数都有自己的曲线。 我们的算法显示, 即使$T ⁇ 3/4}, 也能够适应苍蝇的曲线。 有时, 即使它们中的固定部分不是强烈的曲线, 也会导致某些差异性环境产生令人惊讶的结果 -- 例如, Hadan and Levy(2014) 显示, 我们的方法需要通过一个更牢固的变异化的域框架, 来提升它们。