Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam -- a multiplicative version of the Adam optimiser -- and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.
翻译:生物和人工神经网络的构成是生物和人工神经网络的基本结构特征。 通过梯度下降的学习构成功能产生众所周知的问题,比如消失和爆炸梯度,使谨慎的学习率调整成为现实世界应用所必不可少的。本文证明,多倍增重量更新满足了与构造功能相适应的世系白蚁。我们从这个乳腺中提取了女士 -- -- 亚当的多倍化版本 -- -- 并表明它可以在不进行学习率调整的情况下训练艺术神经网络结构的状态。我们进一步表明,在对数系统中代表其重量,母亲很容易被调整为培训本地压缩神经网络。我们最后通过在多倍增重量更新和关于生物学突变的最新发现之间进行连接。