We resolve the long-standing "impossible tuning" issue for the classic expert problem and show that, it is in fact possible to achieve regret $O\left(\sqrt{(\ln d)\sum_t \ell_{t,i}^2}\right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem where $\ell_{t,i}$ is the loss for expert $i$ in round $t$. Our algorithm is based on the Mirror Descent framework with a correction term and a weighted entropy regularizer. While natural, the algorithm has not been studied before and requires a careful analysis. We also generalize the bound to $O\left(\sqrt{(\ln d)\sum_t (\ell_{t,i}-m_{t,i})^2}\right)$ for any prediction vector $m_t$ that the learner receives, and recover or improve many existing results by choosing different $m_t$. Furthermore, we use the same framework to create a master algorithm that combines a set of base algorithms and learns the best one with little overhead. The new guarantee of our master allows us to derive many new results for both the expert problem and more generally Online Linear Optimization.
翻译:我们解决了对经典专家问题长期存在的“无法调试”问题,并表明,对于所有专家来说,如果美元是美元四舍五入,那么,对于美元四舍五入的问题,我们解决了长期存在的“无法调试”问题。我们的算法是以镜光源框架为基础,有了一个校正术语和一个加权的迷你调制器。自然地,算法以前没有研究过,需要仔细分析。我们还将所有专家在美元四舍五入(美元四舍五入,美元二分之二)中的“美元”的界限概括为美元四舍五入(美元四舍五入,美元四舍五入,美元四舍五入)。此外,我们使用同样的框架创建了一种主算法,将许多基础算法和许多基础算法的模型结合起来,从而可以使许多基础算法和在线算取出新的结果。