最大间隔线性分类器的广义误差：超参数化区域中的良性过拟合和高维渐近性 (The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime)

Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps. We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.

翻译：现代机器学习分类器通常在训练集上表现出消失的分类误差。他们通过学习将数据映射到线性可分的类的非线性输入表示来实现这一点。受这些现象的启发，我们重新考虑了线性可分数据的高维最大间隔分类。我们考虑一个基于线性组合的标签分布和高斯向量的设置，其中数据 $(y_i,{\boldsymbol x}_i)$, $i\le n$ 是独立同分布的，${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ 是一个 $p$ 维高斯特征向量，$y_i \in\{+1,-1\}$ 的分布取决于其与协变量的线性组合 $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$。虽然高斯模型可能看起来非常简单，但是普适性论证可以用来证明此设置中推导出的结果也适用于某些非线性特征映射的输出。我们考虑到当 $p/n\to\psi$ 时，NM比比例渐近 $n,p\to\infty$。我们针对极限广义误差导出精确表达式。我们利用这个理论导出了两个独立的利益结果：$(i)$ 对于 `benign overfitting' 的足够条件，这与线性回归中先前得到的条件相似；$(ii)$ 当最大间隔分类器与由随机一层神经网络产生的特征向量一起使用时，广义误差的渐近精确表达式。

相关内容

泛化误差

关注 106

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日