Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.
翻译:在神经网络训练之前剪枝已经受到越来越多的关注,因为它具有减少训练时间和内存消耗的潜力。一种流行的方法是根据某种度量来剪枝连接,但是什么度量是最好的选择还不完全清楚。
最近神经切向内核(NTK)理论的进展表明,足够大的神经网络的训练动态与NTK的谱密切相关。受此发现的启发,我们建议剪枝对NTK谱影响最小的连接。这种方法可以帮助维护NTK谱,从而可能有助于使训练动态与其密集对应物对齐。然而,一个可能存在的问题是给定初始点对应的固定权重NTK可能与训练阶段后的迭代对应的NTK非常不同。我们进一步建议对随机权重的多个实现进行抽样,以估计NTK谱。请注意,我们的方法是权重无关的,这与大多数现有方法不同,这些方法是权重相关的。此外,我们使用随机输入来计算固定权重NTK,使我们的方法也与数据无关。我们将我们的前瞻剪枝算法命名为神经切向核谱感知剪枝(NTK-SAP)。根据实验,在多个数据集上我们的方法优于所有基线方法。