将许多入场券叠加到一个: 微粒神经网络培训的性能推进器 (Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training)

Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch. Existing sparse training methods usually strive to find the best sparse subnetwork possible in one single run, without involving any expensive dense or pre-training steps. For instance, dynamic sparse training (DST), as one of the most prominent directions, is capable of reaching a competitive performance of dense training by iteratively evolving the sparse topology during the course of training. In this paper, we argue that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork. To achieve this, two desiderata are required: (1) efficiently producing many low-loss subnetworks, the so-called cheap tickets, within one training process limited to the standard training time used in dense training; (2) effectively superposing these cheap tickets into one stronger subnetwork without going over the constrained parameter budget. To corroborate our conjecture, we present a novel sparse training approach, termed \textbf{Sup-tickets}, which can satisfy the above two desiderata concurrently in a single sparse-to-sparse training process. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods and demonstrates consistent performance improvement.

翻译：最近关于稀疏的神经网络培训的工程(粗糙的培训)表明,从零开始培训内在的神经网络,可以实现绩效和效率之间的令人信服的权衡。现有的稀少的培训方法通常在单一运行中寻找最稀少的亚网络,而不涉及任何昂贵的密集或培训前的步骤。例如,作为最突出方向之一的动态稀疏培训(DST)能够通过在培训过程中迭接地演进稀少的地形来达到密集培训的竞争性表现。在本文中,我们认为,最好将有限的资源用于创造多种低损失的亚网络并将其叠加到一个更强的网络,而不是将所有资源完全分配到一个单独的子网络。要做到这一点,需要两种脱缩:(1) 高效地生产许多低损失子网络,即所谓的廉价机票,在一个培训过程以密集培训中使用的标准培训时间为限;(2) 有效地将这些廉价机票加到一个更强的子网络中,而不必超越受限制的参数预算。为了证实我们的配置,我们提出了两个新的稀疏少的培训方法,即100页式培训方法,在不断的摩擦的STRAS-S-RO-S-B-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-ROD-ROD-I-ROD-ROD-ROD-ROD-S-S-I-RO-I-ROD-ROD-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-RO-IS-I-RO-RO-RO-IS-ROD-ROD-ROD-I-I-ROD-I-I-I-I-RO-RO-IS-IS-IS-RO-I-IS-IS-IS-IS-IS-IS-IS-I-I-IS-IS-IS-IS-IS-IS-IS-IS-I-I-I-I-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-I-I-I-I-I-I-I-IS-IS-IS-IS-IS-IS-IS-IS-IS-IS-I-I-I-I-I-I-I-