We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we extend AdaBelief by further suppressing the range of the adaptive stepsizes. To achieve the above goal, we perform mutual layerwise vector projections between the gradient g_t and its first momentum m_t before using them to estimate the second momentum. The new optimization method is referred to as Aida. Thirdly, extensive experimental results show that Aida outperforms nine optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the nine methods when training WGAN-GP models for image generation tasks. Furthermore, Aida produces higher validation accuracies than AdaBelief for training ResNet18 over ImageNet. Code is available <a href="https://github.com/guoqiang-x-zhang/AidaOptimizer">at this URL</a>
翻译:我们为改进适应-优化性能做出了贡献。 我们的改进基于抑制AdaBelief优化器中适应性步骤范围的范围。 首先,我们显示在Adabelief更新表达式中,参数epsilon的特殊位置减少了适应性步骤的范围,使AdaBelief更接近SGD。 其次,我们通过进一步抑制适应性步骤的范围来扩展Adabelief。为了实现上述目标,我们在使用梯度g_t及其第一个动力 m_t之间进行跨层矢量的矢量预测。新的优化方法被称为 Aida。 第三,广泛的实验结果显示,Aida在为NLP培训变压器和LSTMS、VGGG和ResNet进行图像分类时,在为 CIF10 和 CIFAR100 培训WGAN-GP模型进行图像生成任务时,Aida产生比AdaBeliefilif_Aqeptia_Aqumia_Angualia_AHIFF_AA_Aqual= himNet codemocol is code a col sub/ hangqual_ADrodemodemodealmation) 。