We explore whether quantum advantages can be found for the zeroth-order online convex optimization problem, which is also known as bandit convex optimization with multi-point feedback. In this setting, given access to zeroth-order oracles (that is, the loss function is accessed as a black box that returns the function value for any queried input), a player attempts to minimize a sequence of adversarially generated convex loss functions. This procedure can be described as a $T$ round iterative game between the player and the adversary. In this paper, we present quantum algorithms for the problem and show for the first time that potential quantum advantages are possible for problems of online convex optimization. Specifically, our contributions are as follows. (i) When the player is allowed to query zeroth-order oracles $O(1)$ times in each round as feedback, we give a quantum algorithm that achieves $O(\sqrt{T})$ regret without additional dependence of the dimension $n$, which outperforms the already known optimal classical algorithm only achieving $O(\sqrt{nT})$ regret. Note that the regret of our quantum algorithm has achieved the lower bound of classical first-order methods. (ii) We show that for strongly convex loss functions, the quantum algorithm can achieve $O(\log T)$ regret with $O(1)$ queries as well, which means that the quantum algorithm can achieve the same regret bound as the classical algorithms in the full information setting.
翻译:我们探讨是否能找到用于零顺序在线 convex优化问题的量子优势, 这个问题也被称为多点反馈的土匪 convex优化。 在此设置中, 允许访问零顺序或触雷( 即, 损失函数被访问为黑盒, 返回任何询问输入的函数值), 玩家试图将对抗性生成的 convex 损失函数的序列最小化。 这个程序可以描述为玩家和对手之间一个双向迭接游戏的$T美元。 在本文中, 我们为问题提出量子算法, 并首次显示对在线convex优化问题来说, 潜在的量子优势是可能的。 具体而言, 我们的贡献如下 。 (一) 当玩家被允许查询零顺序或触雷的黑盒, 返回任何输入的输入值值值值值值值值值值, 我们给出一个量子算算算法, 在不额外依赖一个维度 $n$( 美元) 的情况下, 这个程序比已知的最佳直观的直观算算算算法仅仅达到$( sqn{T) 美元。 。 具体地说, 我们的算算算算算算算算算法可以实现了 。