In this paper, we explore a process called neural teleportation, a mathematical consequence of applying quiver representation theory to neural networks. Neural teleportation "teleports" a network to a new position in the weight space, while leaving its function unchanged. This concept generalizes the notion of positive scale invariance of ReLU networks to any network with any activation functions and any architecture. In this paper, we shed light on surprising and counter-intuitive consequences neural teleportation has on the loss landscape. In particular, we show that teleportation can be used to explore loss level curves, that it changes the loss landscape, sharpens global minima and boosts back-propagated gradients. From these observations, we demonstrate that teleportation accelerates training when used during initialization regardless of the model, its activation function, the loss function, and the training data. Our results can be reproduced with the code available here: https://github.com/vitalab/neuralteleportation.
翻译:在本文中,我们探索了一个名为神经远程传输的过程,这是一个在神经网络中应用快速表达理论的数学结果。神经远程传输“电子ports”是一个网络在重量空间中的新位置,但功能保持不变。这个概念将“RELU”网络的积极规模变化概念概括到具有任何激活功能和任何结构的任何网络中。在本文中,我们阐明了对损失地貌的意外和反直觉后果的神经远程传输。特别是,我们表明,远程传输可以用来探索损失水平曲线,它可以改变损失水平曲线,放大全球迷你马,并推进反向传播的梯度。我们从这些观察中可以看出,在初始化过程中使用的远程传输加速了培训,而不论模型、激活功能、损失功能和培训数据。我们的结果可以与这里可用的代码复制:https://github.com/vitalab/neuralteleportation。