Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more biologically plausible alternatives to BP. One such algorithm is the inference learning algorithm (IL). IL has close connections to neurobiological models of cortical function and has achieved equal performance to BP on supervised learning and auto-associative tasks. In contrast to BP, however, the mathematical foundations of IL are not well-understood. Here, we develop a novel theoretical framework for IL. Our main result is that IL closely approximates an optimization method known as implicit stochastic gradient descent (implicit SGD), which is distinct from the explicit SGD implemented by BP. Our results further show how the standard implementation of IL can be altered to better approximate implicit SGD. Our novel implementation considerably improves the stability of IL across learning rates, which is consistent with our theory, as a key property of implicit SGD is its stability. We provide extensive simulation results that further support our theoretical interpretations and also demonstrate IL achieves quicker convergence when trained with small mini-batches while matching the performance of BP for large mini-batches.
翻译:深层学习中最成功、最广泛使用的反向演算法(BP)是最成功、最广泛使用的深层学习算法。然而,BP所要求的计算方法对于与已知神经生物学相调和具有挑战性。这一困难刺激了人们对生物生物学上更合理的替代方法的兴趣。其中一种算法是推断学习算法(IL)。IL与皮质功能神经生物学模型有着密切的联系,在监督学习和自动结合任务方面实现了与BP的同等性能。然而,与BP不同的是,IL的数学基础没有很好地理解。在这里,我们为IL开发了一个新的理论框架。我们的主要结果是,IL非常接近了一种最优化方法,即隐含的随机梯度梯度脱落(imclect SGD),这与BP实施的明确的 SGD不同。我们的结果进一步表明,IL的标准实施可以如何被改变为更接近隐含的 SGD任务。我们的新实施大大改善了IL的稳定性,这与我们的理论是一致的,即隐含的 SGD的关键特性是其小的稳定性。我们提供了广泛的模拟结果,进一步支持我们的理论解释,同时也证明了IP大BBBBbas的模拟结果,同时与IMBbas的模拟性能与M的模拟性能与M。