We train deep residual networks with a stochastic variant of the nonlinear multigrid method MG/OPT. To build the multilevel hierarchy, we use the dynamical systems viewpoint specific to residual networks. We report significant speed-ups and additional robustness for training MNIST on deep residual networks. Our numerical experiments also indicate that multilevel training can be used as a pruning technique, as many of the auxiliary networks have accuracies comparable to the original network.
翻译:我们用非线性多电格方法MG/OPT的随机变体来训练深残网络。为了建立多层次的等级,我们使用残余网络特有的动态系统观点。我们报告在深残网络上培训MNIST方面有大量的超速和额外强力。我们的数字实验还表明,多级培训可以作为一种修剪技术使用,因为许多辅助网络具有与原始网络相似的适应性。