The classical Perceptron algorithm of Rosenblatt can be used to find a linear threshold function to correctly classify $n$ linearly separable data points, assuming the classes are separated by some margin $\gamma > 0$. A foundational result is that Perceptron converges after $\Omega(1/\gamma^{2})$ iterations. There have been several recent works that managed to improve this rate by a quadratic factor, to $\Omega(\sqrt{\log n}/\gamma)$, with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through optimistic online learning. We then show that the proposed framework also lead to improved results for a series of problems beyond the standard Perceptron setting. Specifically, a) For the margin maximization problem, we improve the state-of-the-art result from $O(\log t/t^2)$ to $O(1/t^2)$, where $t$ is the number of iterations; b) We provide the first result on identifying the implicit bias property of the classical Nesterov's accelerated gradient descent (NAG) algorithm, and show NAG can maximize the margin with an $O(1/t^2)$ rate; c) For the classical $p$-norm Perceptron problem, we provide an algorithm with $\Omega(\sqrt{(p-1)\log n}/\gamma)$ convergence rate, while existing algorithms suffer the $\Omega({(p-1)}/\gamma^2)$ convergence rate.
翻译:Rosenblat 的经典 Perepron 算法可以用来找到一个线性阈值, 正确分类美元线性分离数据点, 假设这些分类被某种差值 $\ gamma > 0 美元分隔。 一个基本结果是, 在 $\ omega (1/\ gamma\\ ⁇ 2} 美元迭代值之后, Pereperron 趋近了 。 最近有好几项工作设法通过一个二次系数来提高这一比率, 到 $( sqrt_ log n}/\ gamma) 美元, 并辅之以更复杂的算法。 在本文件中, 我们将这些现有结果统一在一个框架之下, 显示它们都可以通过使用现代加速技术解决微量问题的透镜来描述, 主要是通过乐观的在线学习。 我们然后表明, 提议的框架还可以改善一系列问题的结果。 具体来说, 对于差值最大化问题, 我们改进了 美元( log t/ t_ 2) 美元至 美元 美元 的正洛基值 的递( ral_ ral) ralalal) 。