偏散的双子源: 网络如何谨慎调整重力过度配置 (Sparse Double Descent: Where Network Pruning Aggravates Overfitting)

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of $\ell_{2}$ learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.

翻译：人们通常认为,网络运行不仅会降低深层网络的计算成本,而且会防止模型容量的下降,但是,我们的工作令人惊讶地发现,网络运行有时甚至会加剧过度装配。我们报告出乎意料的稀疏双向下降现象,随着我们通过网络运行增加模型宽度,测试性能首先变得更差(由于过度装配),然后(由于过度装配)变得更好,最后(由于忘记了有用的信息)变得更差。虽然最近的研究侧重于模型超分化的深度双向下降,但他们没有认识到过度性也可能导致双向下降。在本文中,我们有三个主要贡献。首先,我们通过广泛的实验报告新颖的稀有双向下降现象。第二,我们提出一个新的远距离解释,即稀释型模型的曲线(从初始参数到最终参数)可能与稀释的双向下降曲线相关联,并且反映一般化比小幅平坦。第三,在稀疏的双向下降的背景下,在彩票的假设中赢票可能永远不会赢。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日