Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a simple module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. The method, which we call TRGL for Transport Regularized Greedy Learning, is particularly well-adapted to residual networks. We study it theoretically, proving that it leads to greedy modules that are regular and that successively solve the task. Experimentally, we show improved accuracy of module-wise trained networks when our regularization is added.
翻译:对神经网络的贪婪层或模块化培训,在限制和装置环境中是令人信服的,因为它绕过端到端的后回推进的一些问题,然而,它却遇到一个停滞问题,即早期层的过度装配和深层在一定深度后不再提高测试准确性。我们提议在分配空间梯度流动最小化计划启发下,采用简单模块化的正规化来解决这一问题。我们称之为TRGL的运输正规化贪婪学习TRGL的方法特别适合剩余网络。我们从理论上研究它,证明它导致经常出现贪婪模块,并连续解决了任务。我们实验性地表明,在添加了规范化时,经过模块化培训的网络的准确性会提高。