State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.
翻译:最先进的深层学习模型有一个能达数十亿的参数计数。 培训、 储存和传输这些模型需要消耗精力和时间, 因而成本很高。 其中很大一部分费用是由网络培训造成的。 模型压缩会降低存储和传输成本, 还可以通过减少前方和(或)后方通道的计算数来进一步提高培训效率。 因此, 在培训时压缩网络,同时保持高性能是一个重要的研究课题。 这项工作是对降低在整个培训过程中深层学习模型中受过训练的体重的方法的调查。 大部分采用的方法将网络参数设置为零, 称为修剪线。 展示的修剪线方法分为初始化、 彩票 和动态分散培训 。 此外, 我们讨论在随机初始化时冻结网络部分的方法。 通过冷却重量, 可训练参数的数量是直线的, 从而降低梯度计算数和模型优化空间的维度。 在本次调查中, 我们首先提出将程度降低培训作为基本数学模型, 包括修练和冷度。