Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle et al. and Su et al. presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to $19\times$ faster.
翻译:大型神经网络可以通过一个耗时的“ 列车、 普鲁奈、 再培训” 方法,将大型神经网络分为其最初规模的一小部分,精确度略微下降。 Frankle & Carbin 假设我们可以通过培训“ 彩票” 来避免这一点, 也就是说, 在初始化时发现特别稀疏的子网络, 可以进行高精度的训练。 然而, Frankle et al和 Su et al. 等随后的一行工作, 提供了具体的证据, 证明当前在初始化时寻找可训练网络的算法, 无法进行简单的基线比较, 例如, 与随机稀疏小的子网络相比。 找到比简单基线更精确的火车彩票, 仍是一个尚未解决的问题。 在这项工作中, 我们提出Gem- Miner, 在初始化时发现彩票, 其初始化时发现彩票比当前基线要高。 Gem- Miner认为, 彩票的精确性比 磁质更强, 并且 速度高达19\ times 。