Recent mean field interpretations of learning dynamics in over-parameterized neural networks offer theoretical insights on the empirical success of first order optimization algorithms in finding global minima of the nonconvex risk landscape. In this paper, we explore applying mean field learning dynamics as a computational algorithm, rather than as an analytical tool. Specifically, we design a Sinkhorn regularized proximal algorithm to approximate the distributional flow from the learning dynamics in the mean field regime over weighted point clouds. In this setting, a contractive fixed point recursion computes the time-varying weights, numerically realizing the interacting Wasserstein gradient flow of the parameter distribution supported over the neuronal ensemble. An appealing aspect of the proposed algorithm is that the measure-valued recursions allow meshless computation. We demonstrate the proposed computational framework of interacting weighted particle evolution on binary and multi-class classification. Our algorithm performs gradient descent of the free energy associated with the risk functional.
翻译:最近对超参数神经网络中学习动态的实地平均解释提供了理论洞察力,说明第一顺序优化算法在寻找非康韦克斯风险场景的全球微型方面所取得的实证成功。在本文中,我们探索将平均实地学习动态作为一种计算算法,而不是作为一种分析工具。具体地说,我们设计了Sinkhorn正规化的近效算法,以近似于在加权点云上从平均实地系统学习动态中流出的分布流。在此背景下,一个合同固定点递归计算时间变化重量,从数字上实现支持神经共性参数分布的瓦塞斯坦梯度流动。拟议算法的一个吸引因素是,量值递归法允许无线计算。我们展示了二进制和多级分类中交互式加权粒子进化的计算框架。我们的算法显示与风险功能相关的自由能量的梯度下降。