对培训具有产出规模变化的神经网络的适应性学习率的意外影响 (Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change)

A multiplicative constant scaling factor is often applied to the model output to adjust the dynamics of neural network parameters. This has been used as one of the key interventions in an empirical study of lazy and active behavior. However, we show that the combination of such scaling and a commonly used adaptive learning rate optimizer strongly affects the training behavior of the neural network. This is problematic as it can cause \emph{unintended behavior} of neural networks, resulting in the misinterpretation of experimental results. Specifically, for some scaling settings, the effect of the adaptive learning rate disappears or is strongly influenced by the scaling factor. To avoid the unintended effect, we present a modification of an optimization algorithm and demonstrate remarkable differences between adaptive learning rate optimization and simple gradient descent, especially with a small ($<1.0$) scaling factor.

翻译：倍增效应的常量缩放因子通常用于模型输出以调整神经网络参数的动态。这已被用作对懒惰和主动行为进行实验性研究的关键干预措施之一。然而,我们表明,这种缩放和常用的适应性学习率优化相结合,对神经网络的培训行为产生了强烈的影响。这有问题,因为它可能导致神经网络的“emph{unitive asseration}”,导致对实验结果的错误解读。具体地说,对于某些缩放环境,适应性学习率的影响消失或受到缩放因素的强烈影响。为了避免意外影响,我们提出了优化算法的修改,并显示了适应性学习率优化和简单梯度下降之间的显著差异,特别是一个小的 < 1.0 美元) 缩放因子。

相关内容

自适应学习

关注 10

自适应学习，也被称为自适应教学，是使用计算机算法来协调与学习者的互动，并提供定制学习资源和学习活动来解决每个学习者的独特需求的教育方法。在专业的学习情境，个人可以“试验出”一些训练方式，以确保教学内容的更新。根据学生的学习需要，计算机生成适应其特点的教育材料，包括他们对问题的回答和完成的任务和经验。该技术涵盖了各个研究领域和它们的衍生，包括计算机科学、人工智能、心理测验、教育学、心理学和脑科学。

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日