Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps. We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.
翻译:现代机器学习分类器通常在训练集上表现出消失的分类误差。他们通过学习将数据映射到线性可分的类的非线性输入表示来实现这一点。受这些现象的启发,我们重新考虑了线性可分数据的高维最大间隔分类。我们考虑一个基于线性组合的标签分布和高斯向量的设置,其中数据 $(y_i,{\boldsymbol x}_i)$, $i\le n$ 是独立同分布的,${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ 是一个 $p$ 维高斯特征向量,$y_i \in\{+1,-1\}$ 的分布取决于其与协变量的线性组合 $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$。虽然高斯模型可能看起来非常简单,但是普适性论证可以用来证明此设置中推导出的结果也适用于某些非线性特征映射的输出。我们考虑到当 $p/n\to\psi$ 时,NM比比例渐近 $n,p\to\infty$。我们针对极限广义误差导出精确表达式。我们利用这个理论导出了两个独立的利益结果:$(i)$ 对于 `benign overfitting' 的足够条件,这与线性回归中先前得到的条件相似;$(ii)$ 当最大间隔分类器与由随机一层神经网络产生的特征向量一起使用时,广义误差的渐近精确表达式。