Although there are massive parameters in deep neural networks, the training can actually proceed in a rather low-dimensional space. By investigating such low-dimensional properties of the training trajectory, we propose a Dynamic Linear Dimensionality Reduction (DLDR), which dramatically reduces the parameter space to a variable subspace of significantly lower dimension. Since there are only a few variables to optimize, second-order methods become applicable. Following this idea, we develop a quasi-Newton-based algorithm to train these variables obtained by DLDR, rather than the original parameters of neural networks. The experimental results strongly support the dimensionality reduction performance: for many standard neural networks, optimizing over only 40 variables, one can achieve comparable performance against the regular training over thousands or even millions of parameters.
翻译:尽管深神经网络中存在大量的参数,但培训实际上可以在相当低维的空间进行。通过对培训轨迹的这种低维特性进行调查,我们建议采用动态线性分量减少(DLDR),将参数空间大幅降低到低维度的可变子空间。由于只有几个变量可以优化,因此第二阶方法可以适用。根据这个想法,我们开发了一种准Newton算法,用于培训DLDR获得的这些变量,而不是神经网络的原始参数。实验结果有力地支持了维度减少的性能:对于许多标准的神经网络来说,优化了超过40个变量,可以实现与超过数千甚至数百万个参数的常规培训的类似性能。