Neural networks (NN)-based learning algorithms are strongly affected by the choices of initialization and data distribution. Different optimization strategies have been proposed for improving the learning trajectory and finding a better optima. However, designing improved optimization strategies is a difficult task under the conventional landscape view. Here, we propose persistent neurons, a trajectory-based strategy that optimizes the learning task using information from previous converged solutions. More precisely, we utilize the end of trajectories and let the parameters explore new landscapes by penalizing the model from converging to the previous solutions under the same initialization. Persistent neurons can be regarded as a stochastic gradient method with informed bias where individual updates are corrupted by deterministic error terms. Specifically, we show that persistent neurons, under certain data distribution, is able to converge to more optimal solutions while initializations under popular framework find bad local minima. We further demonstrate that persistent neurons helps improve the model's performance under both good and poor initializations. We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures, such as AlexNet and residual neural network (ResNet).
翻译:基于神经网络(NN)的学习算法受到初始化和数据分布选择的强烈影响。已经提出了不同的优化战略来改进学习轨迹和寻找更好的选择。但是,在常规景观视图下,设计更好的优化战略是一项困难的任务。在这里,我们提出一个基于轨迹的战略,即利用先前趋同解决方案的信息优化学习任务。更确切地说,我们利用轨迹的终点,让参数探索新的景观,将模型从聚合到同一初始化下以前的解决方案的模型加以惩罚。持久性神经元可被视为一种具有知情偏差的随机梯度方法,在这种方法中,个人更新会因确定性错误的术语而腐蚀。具体地说,我们表明,根据某些数据分布,持久性神经元能够趋于最佳的解决方案,而在流行框架下的初始化过程中发现当地迷你。我们进一步证明,持久性神经元有助于改进模型在良好和不良初始化情况下的性能。我们评估了完整和部分持久性模型,并表明它可以用来提升NN结构的性能,如AlexNet和残余神经网络。