The \textit{lottery ticket hypothesis} (LTH) states that learning on a properly pruned network (the \textit{winning ticket}) improves test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, when the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.
翻译:\ textit{ routtery ticket position} (LTH) 表示, 在正确修补的网络上学习(\ textit{ 得票票票}}) 提高了原始未修补网络的测试精度。 虽然LTH在一系列深层神经网络(DNN) 的应用程序中,在经验上证明LTH是合理的, 包括计算机视觉和自然语言处理等应用程序, 改进了中标票的概括化的理论验证仍然遥不可及。 根据我们的最佳知识, 我们的工作首次通过分析目标函数的几何结构以及样本复杂性来实现零整齐的多层网络的测试性能, 从而实现零整齐化的网络的样本复杂性, 从而实现零整齐化的结果。 我们显示, 接近一个理想的模型的 convex 区域, 随着神经网络模型的精细化而扩大, 表明赢取的票的结构性重要性。 此外, 当修补的神经网络的算法被指定为一种( 缩略) 正规算算时, 我们理论上显示, 实现零整整整的样本的数量与一个不相平整的网络的精化的精化的精化网络的精度相比, 。