We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and image-to-image tasks in vision, and recurrent and bidirectionally-masked models in natural language processing. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance, even on problems for which adaptive methods normally perform poorly.
翻译:我们引入了MADGRAD,这是AdaGrad适应性梯度方法大家庭中的一种新颖优化方法。MADGRAD展示了多个领域在深层次学习优化问题方面的出色表现,包括视觉分类和图像到图像任务,以及自然语言处理中的经常性和双向模具模型。对于其中每一项任务,MADGRAD在测试成套性能方面都匹配或优于SGD和ADAM,甚至就适应性能通常不佳的问题而言也是如此。