Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good initialization automatically, for general feed-forward DNNs. The algorithm utilizes the Jacobian between adjacent network blocks to tune the network hyperparameters to criticality. We solve the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. We then extend the discussion to more general architectures with BatchNorm and residual connections. Finally, we apply our method to ResMLP and VGG architectures, where the automatic one-shot initialization found by our method shows good performance on vision tasks.
翻译:良好的初始化对于培训深神经网络(DNN)至关重要。 这种初始化通常通过试验和错误方法找到,每当一个建筑被大幅改造,或者从较小规模的网络中继承而导致亚最佳初始化时,都必须重新应用。 在这项工作中,我们引入了一种新的廉价算法,允许人们自动找到一个良好的初始化,用于一般的feed-forward DNN。该算法利用邻近网络区块之间的 Jacobian 来调整网络的超分光度。我们解决了与RELU完全连通的网络的算法的动态,并为其趋同创造了条件。我们随后将讨论扩大到与BatchNorm和剩余连接的更一般的结构。最后,我们将我们的方法应用到ResMLP和VGG架构中,我们的方法所发现的自动的一发初始化在视觉任务上表现良好。