The highly structured energy landscape of the loss as a function of parameters for deep neural networks makes it necessary to use sophisticated optimization strategies in order to discover (local) minima that guarantee reasonable performance. Overcoming less suitable local minima is an important prerequisite and often momentum methods are employed to achieve this. As in other non local optimization procedures, this however creates the necessity to balance between exploration and exploitation. In this work, we suggest an event based control mechanism for switching from exploration to exploitation based on reaching a predefined reduction of the loss function. As we give the momentum method a port Hamiltonian interpretation, we apply the 'heavy ball with friction' interpretation and trigger breaking (or friction) when achieving certain goals. We benchmark our method against standard stochastic gradient descent and provide experimental evidence for improved performance of deep neural networks when our strategy is applied.
翻译:深度神经网络的损失函数作为参数的高度结构化能量景观使得为了发现保证合理性的(局部)最小值需要使用复杂的优化策略。克服不太适合的局部最小值是一个重要的前提条件,通常采用动量方法来实现。然而,这创造了在探索和开发之间平衡的必要性,就像其他非局部优化过程一样。在这项工作中,我们提出了一种基于事件的控制机制,用于在达到预定的损失函数减少时从探索切换为开发。由于我们给动量方法一个Port Hamiltonian的解释,我们采用“带摩擦的重球”解释,并在实现某些目标时触发断裂(或摩擦)。我们将我们的方法与标准的随机梯度下降进行了基准测试,并提供了实验证据,表明当应用我们的策略时,深度神经网络的性能得到了改善。