那些年“号称”要超越Adam的优化器
来看看最新的AdaBelief吧
原始Adam:
# Decay the first and second moment running average coefficientexp_avg.mul_(beta1).add_(1 - beta1, grad)exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
AdaBelief代码:
# Update first and second moment running averageexp_avg.mul_(beta1).add_(1 - beta1, grad)grad_residual = grad - exp_avgexp_avg_var.mul_(beta2).addcmul_(1 - beta2, grad_residual, grad_residual)
实验
讨论
点击阅读原文,直达AAAI小组!