In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}})$ expected bound where $L_T(f^\ast_i)$ is the cumulative losses of optimal hypothesis in $\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}$. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln^{\frac{2}{3}}{T})$ expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\})$ expected bound where $\mathcal{R}$ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.
翻译:在本文中,我们改进了在线内核选择中基于赌博反馈的遗憾界。前一个算法针对Lipschitz损失函数,享有$O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$的预期界限。我们证明了两种遗憾界限类型,改善了以前的界。对于平滑损失函数,我们提出了一种算法,具有$O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}})$的预期界限,其中$L_T(f^\ast_i)$是最优假设的累计损失函数,$\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}$。数据相关的预期界限保持先前的最坏情况下的预期界限,在大多数候选内核与数据匹配良好时更小。对于Lipschitz损失函数,我们提出了一种算法,具有$O(U\sqrt{KT}\ln^{\frac{2}{3}}{T})$的预期界限,从渐进意义上改善了以前的界。我们将这两种算法应用于具有时间限制的在线内核选择,并证明了新的遗憾界,与以前的$O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\})$的预期界限匹配或改进,其中$\mathcal{R}$是时间预算。最后,我们在在线回归和分类任务上实证验证了我们的算法。