Predictive Coding Networks (PCNs) aim to learn a generative model of the world. Given observations, this generative model can then be inverted to infer the causes of those observations. However, when training PCNs, a noticeable pathology is often observed where inference accuracy peaks and then declines with further training. This cannot be explained by overfitting since both training and test accuracy decrease simultaneously. Here we provide a thorough investigation of this phenomenon and show that it is caused by an imbalance between the speeds at which the various layers of the PCN converge. We demonstrate that this can be prevented by regularising the weight matrices at each layer: by restricting the relative size of matrix singular values, we allow the weight matrix to change but restrict the overall impact which a layer can have on its neighbours. We also demonstrate that a similar effect can be achieved through a more biologically plausible and simple scheme of just capping the weights.
翻译:预测性编码网络(PCN)旨在学习一种世界基因模型。根据观察,这种基因模型可以反转来推断这些观察的原因。然而,在培训多氯化萘时,往往会观察到一种明显的病理学,因为推断精度会达到峰值,然后随着进一步的培训而下降。由于培训和测试精度会同时下降,这无法通过过度匹配来解释。我们在这里对这一现象进行彻底调查,并表明这是多氯化萘不同层次汇合速度之间的不平衡造成的。我们证明,可以通过使每一层的重量矩阵正规化来防止这种现象:通过限制矩阵单值的相对大小,我们允许加权矩阵改变,但限制一个层可能对邻国造成的总体影响。我们还表明,通过一个在生物学上更合理和简单地限定重量的方法,可以实现类似的效果。