Kernel-based bandit is an extensively studied black-box optimization problem, in which the objective function is assumed to live in a known reproducing kernel Hilbert space. While nearly optimal regret bounds (up to logarithmic factors) are established in the noisy setting, surprisingly, less is known about the noise-free setting (when the exact values of the underlying function is accessible without observation noise). We discuss several upper bounds on regret; none of which seem order optimal, and provide a conjecture on the order optimal regret bound.
翻译:以内核为基础的土匪是一个经过广泛研究的黑盒优化问题,其目标功能假定生活在已知的复制内核Hilbert空间中。 虽然在吵闹的气氛中几乎建立了最佳遗憾界限(直至对数因素 ), 但令人惊讶的是,人们对无噪音环境(当基本功能的确切值可以在没有观测噪音的情况下获得时)知之甚少。 我们讨论了一些关于遗憾的上限;其中没有一个似乎是最符合秩序的,并且提供了对秩序最佳遗憾约束的推测。