The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are often empirical and can have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models. The code is available at: https://github.com/IST-DASLab/ACDC .
翻译:深神经网络(DNNs)的计算要求不断提高,导致人们极有兴趣获得稀少但又准确的DNN模型。最近的工作调查了更困难的缺乏培训案例,因为DNN的加权数在培训过程中已经尽可能少,以降低计算成本。现有的稀少培训方法往往是经验性的,与密集的基线相比,其精确度可能较低。在本文中,我们提出了一个一般方法,称为“交替压缩压缩/DeCompressed(AC/DC)” DNNS培训,显示对一种变式算法的趋同,并表明AC/DC在类似计算预算中比现有的稀少培训方法要好得多;在高宽度水平,AC/DC甚至比现有方法要差一些,以降低培训过程中的计算成本。AC/DC的一个重要特征是,它允许对密度和稀薄的模型进行联合培训,在培训过程结束时产生精确的稀薄模型。这在实践上是有用的,压缩的变式模型可能比在资源紧张的环境中部署不精确的精确度;在高度水平水平上,AC/DASFCSDR的精确度模型中提供。