Recent works have suggested that finite Bayesian neural networks may outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and quadratic cost have a largely universal form. We illustrate this explicitly for two classes of fully connected networks: deep linear networks and networks with a single nonlinear hidden layer. Our results begin to elucidate which features of data wide Bayesian neural networks learn to represent.
翻译:最近的工作表明,有限的贝叶西亚神经网络的性能可能超过其无穷的表兄弟,因为有限的网络可以灵活地调整其内部表现方式。然而,我们对有限网络所学的隐性层表示与无限网络的固定表示方式不同如何的理论理解仍然不完整。对先前的网络和后方网络的不成熟的有限宽度校正进行了研究,但所学特征的无足轻重的描述尚未得到充分描述。在这里,我们争论说,对于任何具有线性读出和二次成本的巴伊西亚网络的平均地物内核内核,主要的有限维度校正在很大程度上具有普遍的形式。我们明确地说明了这两类完全连接的网络:一个单一的非线性隐性层的深线性网络和网络。我们的结果开始阐明泛巴伊西亚神经网络的哪些特征可以代表。