In a recent paper Wunderlich and Pehle introduced the EventProp algorithm that enables training spiking neural networks by gradient descent on exact gradients. In this paper we present extensions of EventProp to support a wider class of loss functions and an implementation in the GPU enhanced neuronal networks framework which exploits sparsity. The GPU acceleration allows us to test EventProp extensively on more challenging learning benchmarks. We find that EventProp performs well on some tasks but for others there are issues where learning is slow or fails entirely. Here, we analyse these issues in detail and discover that they relate to the use of the exact gradient of the loss function, which by its nature does not provide information about loss changes due to spike creation or spike deletion. Depending on the details of the task and loss function, descending the exact gradient with EventProp can lead to the deletion of important spikes and so to an inadvertent increase of the loss and decrease of classification accuracy and hence a failure to learn. In other situations the lack of knowledge about the benefits of creating additional spikes can lead to a lack of gradient flow into earlier layers, slowing down learning. We eventually present a first glimpse of a solution to these problems in the form of `loss shaping', where we introduce a suitable weighting function into an integral loss to increase gradient flow from the output layer towards earlier layers.
翻译:在最近的一篇论文中, Wundillich 和 Pehle 引入了“ 事件Prop ” 算法, 通过精确的梯度梯度下降来通过梯度下降来培训神经网络。 在本文中, 我们展示了“ 事件Prop ” 的扩展, 以支持更广泛的损失功能, 并在GPU增强的神经网络框架内实施, 从而利用宽度。 GPU 加速能让我们在更具挑战性的学习基准上广泛测试“ 事件Prop ” 。 我们发现“ 事件Prop ” 在某些任务上表现良好, 而在另一些任务上却存在学习缓慢或完全失败的问题。 在这里, 我们详细分析这些问题, 发现这些问题与损失功能的确切梯度的使用有关, 由于其性质, 无法提供因激增或删除而导致的损失变化的信息。 根据任务和损失功能的细节, 将确切的梯度降为“ 事件Prop ”, 能够导致删除重要的峰值, 从而不小心地增加损失, 并降低分类准确性, 从而无法学习。 在另一些情况下,, 缺乏对额外峰值的好处的了解, 可能导致梯度的梯度向早期的梯层流,, 。我们最终看到, 我们向着这些阶层 的阶层 将开始 的渐变。