为什么彩票赢? 谨慎神经网络抽样复杂性理论观点 (Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks)

The \textit{lottery ticket hypothesis} (LTH) states that learning on a properly pruned network (the \textit{winning ticket}) improves test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, when the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.

翻译：\ textit{ routtery ticket position} (LTH) 表示, 在正确修补的网络上学习(\ textit{ 得票票票}}) 提高了原始未修补网络的测试精度。虽然LTH在一系列深层神经网络(DNN) 的应用程序中,在经验上证明LTH是合理的, 包括计算机视觉和自然语言处理等应用程序, 改进了中标票的概括化的理论验证仍然遥不可及。根据我们的最佳知识, 我们的工作首次通过分析目标函数的几何结构以及样本复杂性来实现零整齐的多层网络的测试性能, 从而实现零整齐化的网络的样本复杂性, 从而实现零整齐化的结果。我们显示, 接近一个理想的模型的 convex 区域, 随着神经网络模型的精细化而扩大, 表明赢取的票的结构性重要性。此外, 当修补的神经网络的算法被指定为一种( 缩略) 正规算算时, 我们理论上显示, 实现零整整整的样本的数量与一个不相平整的网络的精化的精化的精化网络的精度相比, 。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日