In this study, we investigate learning rate adaption at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining multiple levels of learning rates with hierarchical structures. Meanwhile, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety of circumstances.
翻译:在这项研究中,我们根据高度梯度下游框架调查不同层次的学习率适应情况,并提出一种方法,通过将多层次的学习率与等级结构相结合,适应性地学习优化参数。与此同时,我们展示了标准化的超分度学习率和在不同层次建立适应性学习率组合之间的关系。关于若干网络结构的实验,包括进料前向网络、LeNet-5和ResNet-18/34,表明拟议的多层次适应方法在各种情况下可以优于基线适应方法。