Training of neural networks is a computationally intensive task. The significance of understanding and modeling the training dynamics is growing as increasingly larger networks are being trained. We propose in this work a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. We refer to our algorithm as \emph{correlation mode decomposition} (CMD). It splits the parameter space into groups of parameters (modes) which behave in a highly correlated manner through the epochs. We achieve a remarkable dimensionality reduction with this approach, where networks like ResNet-18, transformers and GANs, containing millions of parameters, can be modeled well using just a few modes. We observe each typical time profile of a mode is spread throughout the network in all layers. Moreover, our model induces regularization which yields better generalization capacity on the test set. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
翻译:神经网络的培训是一项计算密集的任务。 随着越来越多的网络正在接受培训,理解和模拟培训动态的重要性正在增加。 我们在此工作中建议一个基于参数动态相关性的模型,该模型将大大降低维度。 我们称我们的算法为 \emph{corelation 模式分解 } (CMD) 。 它将参数空间分成一组参数( 模式), 通过这些参数在时代以高度关联的方式运行。 我们通过这种方法实现了显著的维度下降, 使用这个方法, 包含数以百万计参数的ResNet-18、变压器和GANs等网络可以很好地建模。 我们观察到一种模式的每个典型时间分布在网络的各个层次上。 此外, 我们的模型还引导了正规化, 从而在测试集中产生更好的概括能力。 这个代表可以增进对基本培训动态的理解, 并为设计更好的加速技术铺平了道路。