Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and Gaussian likelihood have a largely universal form. We illustrate this explicitly for three tractable network architectures: deep linear fully-connected and convolutional networks, and networks with a single nonlinear hidden layer. Our results begin to elucidate how task-relevant learning signals shape the hidden layer representations of wide Bayesian neural networks.
翻译:最近的工作表明,有限的贝叶西亚神经网络有时可能优于其无限表兄妹,因为有限的网络可以灵活地调整其内部代表形式。然而,我们对有限网络所学到的隐性层表示与无限网络的固定代表形式不同是如何不同的理论理解仍然不完整。对先前的网络和后方网络的不成熟的有限宽度校正进行了研究,但所学特征的无足轻重的描述还没有得到充分描述。在这里,我们争论说,对任何具有线性读出和高斯概率的巴伊西亚网络的平均地物内核内核内核进行的主要有限边缘校正基本上具有普遍性。我们明确地为三种可移动的网络结构说明这一点:深线性完全连通和连动的网络,以及带有单一非线性隐性层的网络。我们的成果开始阐明任务相关学习信号如何塑造大海湾神经网络的隐藏层表示。