The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an appropriately defined Riemannian manifold. We provide numerical simulations which allow us to study and characterize the effect of the mirror potential on the performance of networks trained with SMD for some binary classification problems.
翻译:光学镜底值算法(SMD)是一种一般的培训算法,它作为一个特例,包括有节制的随机梯度底值(SGD),作为一个特例。它利用镜像潜力影响培训算法的隐含偏差。在本文中,我们探讨了SMD的迭代功能在中场共性模型上的性能。我们的结果概括了在此类模型上为SGD获得的早期结果。参数分布的演变是在概率分布空间中持续时间过程的绘图。我们的主要结果提供了非线性部分差异方程式,使连续的时间过程在大型网络的无干扰系统中相汇合。镜像潜力的影响表现为一个多复制的术语,该术语与其赫西文的反面相等,可解释为界定一个定义得当量的里曼多管流。我们提供数字模拟,使我们能够研究并描述镜子潜力对与SMD所训练的网络在二进分解问题方面的性能的影响。