We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity of the Monte Carlo method allows aMC to train neural networks in which the gradient is too small to allow training by gradient descent. We suggest that, as for molecular simulation, Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.
翻译:我们通过最大限度地减少损失功能来检查零温度大都会蒙特卡洛算法,以此作为培训神经网络的工具,我们发现,正如其他作者根据理论和实验经验所预期的那样,蒙得卡洛大都会可以训练一个精确度与梯度下降率相当的神经网,即使不一定如此迅速。当神经网络参数数量巨大时,大都会算法不会自动失败。当神经网络结构或神经激活非常不均时,它可能失败,我们引入适应性的蒙特卡洛算法(aMC)来克服这些限制。蒙特卡洛方法的内在随机性使MC能够训练神经网络,而梯度太小,无法进行梯度下降式下降培训。我们建议,就分子模拟而言,蒙特卡洛方法可以补充基于梯度的方法来培训神经网络,允许使用一套独特的网络结构和原则。