We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regularization term known to underlie dropout in linear regression. We also observe a preference to prune weights that are less well-aligned with the data labels. We evaluate probabilistic fine-tuning for optimizing stochastic pruning masks for neural networks, starting from masks produced by several baselines. In each case, we see improvements in test error over baselines, even after we threshold fine-tuned stochastic pruning masks. Finally, since a stochastic pruning mask induces a stochastic neural network, we consider training the weights and/or pruning probabilities simultaneously to minimize a PAC-Bayes bound on generalization error. Using data-dependent priors, we obtain a selfbounded learning algorithm with strong performance and numerically tight bounds. In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
翻译:我们研究一种方法来学习修剪面罩的修剪方法,优化预想的修剪面罩的损耗,即面罩,使每个重量以某种特定重量的概率独立地将每个重量除去;我们分析线性回归设置过程中导出随机预测器的培训动态,并观察一个数据适应性L1正规化术语,这与数据适应性L2正规化术语不同,该术语在线性回归中可以作为退缩的基础。我们还观察到一种偏向于与数据标签不相符的普鲁纳重量的偏好。我们从几个基线产生的口罩开始,对优化神经网络的透析性遮罩进行最佳调整。我们从若干基线产生的遮罩开始,我们从几个基开始,我们发现测试误差在基线上有所改进,即使在我们开始微调的经调整的L1调整形口罩之后,我们发现测试误差的L1个数据模拟和(或)正比值的模型,我们考虑同时培训重量和(或)正比值,以便最大限度地减少对神经网络的压误差。我们使用数据依赖前期,我们通过一个受控的内定的内定的自我测试,我们获得了一种受控的内定的数值的自我分析。