We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time. We provide a short information-theoretic proof that the minimax regret is at most $O(d \sqrt{n} \log(n \operatorname{diam}(\mathcal K)))$ where $n$ is the number of interactions, $d$ the dimension and $\operatorname{diam}(\mathcal K)$ is the diameter of the constraint set.
翻译:我们用一个对手来分析对抗性土匪的优化, 该对手仅限于为 convex $g_t(\ langle x,\theta\rangle)$( g_t)( g_t( langle x,\theta\rangle)$) 的函数, 而这个对手的对抗性土匪的优化是同一时间的未知的 $\thetabb R$。 我们提供了一个简短的信息理论证明, 迷你马克斯的遗憾最多为 $( d) \ sqrt{n}\log( n\ operatorname{diam} (\\ mathcal K) $, 其中一美元是互动的数量, 美元是尺寸的维数, $\ atorname{diam} (\mathcal K) $是约束设置的直径 。