We propose a learning framework based on stochastic Bregman iterations to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4% of the parameters of ResNet-18 we achieve 90.2% test accuracy on CIFAR-10, compared to 93.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training.
翻译:我们提出一个基于Stochestic Bregman 迭代的学习框架,以培养具有反向空间方法的稀有神经网络。我们提出一个叫LinBreg的基线算法,即使用动力加速版的LinBreg和AdaBreg的基线算法,这是对亚当算法的一种Bregman化的概括化。与分散培训的既定方法不同,拟议的算法构成一个纯粹以优化为基础的神经网络的再增长战略,没有额外的超常理论。我们的Bregman学习框架以极少的初步参数开始培训,而仅增加重要的参数,以获得稀疏和直观的网络。拟议的方法非常简单和高效,但得到了反向空间方法丰富的数学理论的支持。我们提出了统计上深度的稀疏杂参数初始化战略,对损失衰变现和 convex 系统中的其他趋同证据进行了严格的分析。我们只使用了ResNet-18参数的3.4%,我们实现了对CIFAR-10的90.2%的测试精确度,而使用密度网络则达到93.6%。我们的算法还揭示了一种用于大规模解析和变现资源任务。