Why does the Adam optimizer work so well in deep-learning applications? Adam's originators, Kingma and Ba, presented a mathematical argument that was meant to help explain its success, but Bock and colleagues have since reported that a key piece is missing from that argument $-$ an unproven lemma which we will call Bock's conjecture. Here we show that this conjecture is false, but we prove a modified version of it $-$ a generalization of a result of Reddi and colleagues $-$ which can take its place in analyses of Adam.
翻译:为什么亚当的优化器在深造应用中如此成功?亚当的发端人金玛和巴提出了一个数学论点,旨在帮助解释其成功与否,但博克和同事后来报告说,从这一论点中缺少了一个关键部分,即一个我们称之为博克猜想的未经证实的利玛美元。这里我们表明,这种推测是虚假的,但我们证明,这一假设的修改版本是雷迪和同事的美美美的概括,可以在对亚当的分析中取而代之。