Considering a probability distribution over parameters is known as an efficient strategy to learn a neural network with non-differentiable activation functions. We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights. Our work leverages a recent analysis derived from the PAC-Bayesian framework that derives tight generalization bounds and learning procedures for the expected output value of such an aggregation, which is given by an analytical expression. While the combinatorial nature of the latter has been circumvented by approximations in previous works, we show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach. This leads us to a peculiar bound minimization learning algorithm for binary activated neural networks, where the forward pass propagates probabilities over representations instead of activation values. A stochastic counterpart that scales to wide architectures is proposed.
翻译:考虑参数分布,是学习非可微激活函数神经网络的有效策略。我们研究神经网络的期望输出作为预测器本身,着重于二元激活神经网络(使用正态分布的实数值权重)的聚合。我们的工作利用了最近从PAC-Bayesian框架中得出的分析,针对上述聚合的期望输出值进行了紧密的泛化界限和学习过程分析,这是由分析式给出的。尽管先前的工作通过近似绕过了其组合性质,但我们证明了对于深而窄的神经网络,通过动态规划方法仍然可以计算其精确值。这使我们得出了一个特殊的二元激活神经网络学习算法,其中前向传递代替激活值,传播表示的概率。我们提出了一个适用于广泛架构的随机算法。