We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.
翻译:我们提出并分析一种基于动力的梯度方法,用于培训具有指数尾数损失(如指数或后勤损失)的线性分类员,该方法使可分离数据的分类幅度最大化,以$\ 宽度{O}{(1/t ⁇ 2)$的速率最大化。这与标准梯度下降的速率$mathcal{O}(1/\log(t))美元和正常梯度下降的$$\mathcal{O}(1/t)美元形成对照。这一基于动力的方法通过最大海拔问题的二次曲线,特别是将Nesterov加速到这一双重数据中,这导致在原始值中采用简单和直观的方法。这种双重观点也可以用来产生一种随机可变的变量,通过双重变量进行适应的非统一取样。