对培训具有产出规模变化的神经网络的适应性学习率的意外影响 (Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change)

A multiplicative constant scaling factor is often applied to the model output to adjust the dynamics of neural network parameters. This has been used as one of the key interventions in an empirical study of lazy and active behavior. However, we show that the combination of such scaling and a commonly used adaptive learning rate optimizer strongly affects the training behavior of the neural network. This is problematic as it can cause \emph{unintended behavior} of neural networks, resulting in the misinterpretation of experimental results. Specifically, for some scaling settings, the effect of the adaptive learning rate disappears or is strongly influenced by the scaling factor. To avoid the unintended effect, we present a modification of an optimization algorithm and demonstrate remarkable differences between adaptive learning rate optimization and simple gradient descent, especially with a small ($<1.0$) scaling factor.

翻译：倍增效应的常量缩放因子通常用于模型输出以调整神经网络参数的动态。这已被用作对懒惰和主动行为进行实验性研究的关键干预措施之一。然而,我们表明,这种缩放和常用的适应性学习率优化相结合,对神经网络的培训行为产生了强烈的影响。这有问题,因为它可能导致神经网络的“emph{unitive asseration}”,导致对实验结果的错误解读。具体地说,对于某些缩放环境,适应性学习率的影响消失或受到缩放因素的强烈影响。为了避免意外影响,我们提出了优化算法的修改,并显示了适应性学习率优化和简单梯度下降之间的显著差异,特别是一个小的 < 1.0 美元) 缩放因子。

相关内容

自适应学习

关注 10

自适应学习，也被称为自适应教学，是使用计算机算法来协调与学习者的互动，并提供定制学习资源和学习活动来解决每个学习者的独特需求的教育方法。在专业的学习情境，个人可以“试验出”一些训练方式，以确保教学内容的更新。根据学生的学习需要，计算机生成适应其特点的教育材料，包括他们对问题的回答和完成的任务和经验。该技术涵盖了各个研究领域和它们的衍生，包括计算机科学、人工智能、心理测验、教育学、心理学和脑科学。

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日