Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch. Existing sparse training methods usually strive to find the best sparse subnetwork possible in one single run, without involving any expensive dense or pre-training steps. For instance, dynamic sparse training (DST), is capable of reaching a competitive performance of dense training by iteratively evolving the sparse topology during the course of training. In this paper, we argue that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork. To achieve this, two desiderata are required: (1) efficiently producing many low-loss subnetworks, the so-called cheap tickets, within one training process limited to the standard training time used in dense training; (2) effectively superposing these cheap tickets into one stronger subnetwork. To corroborate our conjecture, we present a novel sparse training approach, termed Sup-tickets, which can satisfy the above two desiderata concurrently in a single sparse-to-sparse training process. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods and demonstrates consistent performance improvement.
翻译:最近关于稀疏神经网络培训的工作(粗糙的培训)表明,通过从零开始对内在稀疏神经网络进行培训,可以实现绩效和效率之间的令人信服的权衡。现有的稀少培训方法通常努力找到一个单一过程可能最稀少的子网络,而不涉及任何昂贵的密集或培训前的步骤。例如,动态稀少培训(DST)能够通过在培训过程中迭接地演进稀少的地形来达到密集培训的竞争性表现。在本文中,我们认为,最好将有限的资源分配到创造多种低损失的稀少子网络并将其叠加到一个更强的网络,而不是完全将所有资源分配到一个单独的子网络。要做到这一点,需要两种脱让:(1) 高效地生产许多低廉的子网络,即所谓的廉价机票,在一个培训过程中,限于密集培训中使用的标准培训时间;(2) 有效地将这些廉价机票加到一个更强的子网络。为了证实我们的推测,我们提出了一种新型的稀少培训方法,称为Supt-tkets,这可以满足以上两个不动不动式的培训方法。要同时将一个连续的、连续的、不动不动式的、不动式的Srrestrestrestrestrestre-trastrestrestrestrestrestraxx