Deep learning uses neural networks which are parameterised by their weights. The neural networks are usually trained by tuning the weights to directly minimise a given loss function. In this paper we propose to re-parameterise the weights into targets for the firing strengths of the individual nodes in the network. Given a set of targets, it is possible to calculate the weights which make the firing strengths best meet those targets. It is argued that using targets for training addresses the problem of exploding gradients, by a process which we call cascade untangling, and makes the loss-function surface smoother to traverse, and so leads to easier, faster training, and also potentially better generalisation, of the neural network. It also allows for easier learning of deeper and recurrent network structures. The necessary conversion of targets to weights comes at an extra computational expense, which is in many cases manageable. Learning in target space can be combined with existing neural-network optimisers, for extra gain. Experimental results show the speed of using target space, and examples of improved generalisation, for fully-connected networks and convolutional networks, and the ability to recall and process long time sequences and perform natural-language processing with recurrent networks.
翻译:深层学习使用以重量为参数的神经网络。 神经网络通常通过调整重量来训练, 以直接最小化给定损失功能。 在本文中, 我们提议将重量重新计为网络中单个节点的射击力目标。 根据一组目标, 可以计算出使射击力最能达到这些目标的重量。 有人争辩说, 使用培训目标解决爆炸梯度问题, 方法是我们称之为“ 破碎” 的进程, 使损失功能表面更顺畅, 从而导致神经网络更加容易、 更快地培训, 并有可能更好地加以概括。 这还有助于更容易地学习更深和经常性的网络结构。 将目标转换为重量需要额外的计算成本, 在许多情况下这是可以控制的。 目标空间的学习可以与现有的神经网络的选取性人相结合, 以获得额外的好处。 实验结果显示使用目标空间的速度, 以及改进的通用空间的例子, 以便完全连接的网络和同步的网络, 以及经常性的、 时间序列的恢复和连续处理能力。