In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose "angular update" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.
翻译:在这项工作中,我们全面揭示了神经网络的学习动态,这些神经网络具有正常化、重量衰减(WD)和SGD(动力),称为球状运动动力(SMD),大多数相关作品都通过侧重于“平衡”条件下的“有效学习率”来研究SMD,其重量标准保持不变。然而,关于为什么在SMD中达到平衡条件的讨论要么不存在,要么不那么令人信服。我们的工作通过直接探索平衡状况的原因对SMD进行调查。具体地说,1我们引入了可能导致SMD中平衡状况的假设,并证明重量标准可以以线性速度与特定假设趋同;2我们建议用“角更新”替代有效学习率来衡量SMD中神经网络的演变,并证明角性更新也可以以线性速度与其理论价值趋同;3我们核查了我们关于各种计算机视觉任务的假设和理论结果,包括图像网和MCCO与标准环境。实验结果表明我们的理论结果与经验观察非常一致。