那些年“号称”要超越Adam的优化器
来看看最新的AdaBelief吧
原始Adam:
# Decay the first and second moment running average coefficient
exp_avg.mul_(beta1).add_(1 - beta1, grad)
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
AdaBelief代码:
# Update first and second moment running average
exp_avg.mul_(beta1).add_(1 - beta1, grad)
grad_residual = grad - exp_avg
exp_avg_var.mul_(beta2).addcmul_(1 - beta2, grad_residual, grad_residual)
实验
讨论
点击阅读原文,直达AAAI小组!