We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide three versions of MetaGrad. The full matrix version maintains a full covariance matrix and is applicable to learning tasks for which we can afford update time quadratic in the dimension. The other two versions provide speed-ups for high-dimensional learning tasks with an update time that is linear in the dimension: one is based on sketching, the other on running a separate copy of the basic algorithm per coordinate. We evaluate all versions of MetaGrad on benchmark online classification and regression tasks, on which they consistently outperform both online gradient descent and AdaGrad.
翻译:我们为在线 convex 优化提供了一种新的适应方法MetaGrad, 该方法对于一般 convex 损失非常有力,但对于广泛的特殊功能类别而言,包括Exm- concave 和强烈 convex 功能,我们提供了一种新的适应方法,这种方法对于一般 convex 损失非常有力,但对于广泛的特殊功能类别,包括Exp- concave 和强烈 convex 函数,我们实现了更快的速率,但也提供了各种类型没有曲线的随机和无随机功能。我们通过连接伯恩斯坦条件来证明这一点,因为伯恩斯坦条件已知意味着离线统计学习的速率。MetaGrad进一步自动适应梯度的大小。它的主要特征是,它同时考虑多种学习率,这种速率与它们使用新的元数据-algorithm 数据的经验性能直接成正比。我们提供了三种版本的MetaGrad 。 完整的矩阵版本维持了完全的共变矩阵,适用于学习任务,我们可负担在维度上更新时间变方位。 另外两个版本提供高度学习任务的快速的进度,其尺寸为线性更新时间是线性的。 : 以素为基础,另一个是根据素描写图,另一个,我们根据在线的模型对等式的模型进行在线的升级。