A central goal in deep learning is to learn compact representations of features at every layer of a neural network, which is useful for both unsupervised representation learning and structured network pruning. While there is a growing body of work in structured pruning, current state-of-the-art methods suffer from two key limitations: (i) instability during training, and (ii) need for an additional step of fine-tuning, which is resource-intensive. At the core of these limitations is the lack of a systematic approach that jointly prunes and refines weights during training in a single stage, and does not require any fine-tuning upon convergence to achieve state-of-the-art performance. We present a novel single-stage structured pruning method termed DiscriminAtive Masking (DAM). The key intuition behind DAM is to discriminatively prefer some of the neurons to be refined during the training process, while gradually masking out other neurons. We show that our proposed DAM approach has remarkably good performance over various applications, including dimensionality reduction, recommendation system, graph representation learning, and structured pruning for image classification. We also theoretically show that the learning objective of DAM is directly related to minimizing the L0 norm of the masking layer.
翻译:深层次学习的一个中心目标是在神经网络的每个层次上学习对特征的简明表现,这对于不受监督的代表性学习和结构化网络运行都是有用的。虽然在结构化的修剪方面工作越来越多,但目前最先进的修剪方法却有两个主要的局限性:(一) 培训期间不稳定,以及(二) 需要再加一步微调,这是资源密集型的。这些限制的核心是缺乏一种系统化的方法,在培训的单一阶段,在培训过程中联合推理和精炼重量,不需要对趋同作任何微调,以达到最新业绩。我们提出了一种新型的单一阶段结构化修剪剪方法,称为“分辨性遮掩”(DAM)。DAM背后的主要直觉是偏重某些神经元,在培训过程中加以精细化,同时逐渐遮掩其他神经元。我们表明,我们提议的DAM方法在各种应用方面表现出色,包括减少维度、建议系统、图表展示学习和结构化的图理学。我们从理论上显示,与图像层分类目标有关的学习与LAM有关。