We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
翻译:我们显示,迭代星级驱动网络的错误是经验性地遵循一个具有可解释系数的缩放法,该系数取决于结构和任务。我们用功能来比较被切割的网络的错误,显示它具有可预见性,具有不定的搭接宽度、深度和支接水平,因此,巨大不同的细径密度网络是可以互换的。我们展示了这一近似在深度、宽度、数据集大小和密度方面相对于数量级的准确性。我们显示,功能形式(一般化的)持有大尺度数据(例如图像网络)和结构(例如ResNets ) 。随着神经网络变得越来越大,培训成本越来越高,我们的调查结果为从概念上和分析上推理出一个框架,用以解释非结构化的裁剪裁的标准方法。