We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.
翻译:我们的焦点是特定一类具有单一隐藏层的浅层神经网络,即拥有$L_2美元标准化数据、或具有Sigmoid形高斯误差函数(“erf”)激活或高斯误差线性单位(GELU)激活的浅层神经网络。 对于这些网络,我们通过PAC-Bayesian理论得出新的概括性界限;与大多数现有的此类界限不同的是,它们适用于具有确定性参数而非随机参数的神经网络。 当网络在MNIST和Fashion-MNIST上接受香草的随机梯度梯子血统培训时,我们的界限在经验上是无懈可击的。