This work theoretically studies stochastic neural networks, a main type of neural network in use. We prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Our theory justifies the common intuition that adding stochasticity to the model can help regularize the model by introducing an averaging effect. Two common examples that our theory can be relevant to are neural networks with dropout and Bayesian latent variable models in a special limit. Our result thus helps better understand how stochasticity affects the learning of neural networks and potentially design better architectures for practical problems.
翻译:这项工作在理论上研究神经神经网络,这是正在使用的神经网络的主要类型。我们证明,由于最优化神经网络的宽度往往是无限的,因此其培训设置的预测差异下降到零。我们的理论证明,给模型增加随机性这一共同直觉有助于通过引入平均效果来规范模型。我们理论中有两个共同的例子,即神经网络与辍学和贝叶斯潜伏变量模型在特殊限度内相关。因此,我们的结果有助于更好地了解神经网络的学习以及可能设计更好的实际问题结构如何影响神经网络的学习。