The Multi-Prize Lottery Ticket Hypothesis posits that randomly initialized neural networks contain several subnetworks that achieve comparable accuracy to fully trained models of the same architecture. However, current methods require that the network is sufficiently overparameterized. In this work, we propose a modification to two state-of-the-art algorithms (Edge-Popup and Biprop) that finds high-accuracy subnetworks with no additional storage cost or scaling. The algorithm, Iterative Weight Recycling, identifies subsets of important weights within a randomly initialized network for intra-layer reuse. Empirically we show improvements on smaller network architectures and higher prune rates, finding that model sparsity can be increased through the "recycling" of existing weights. In addition to Iterative Weight Recycling, we complement the Multi-Prize Lottery Ticket Hypothesis with a reciprocal finding: high-accuracy, randomly initialized subnetwork's produce diverse masks, despite being generated with the same hyperparameter's and pruning strategy. We explore the landscapes of these masks, which show high variability.
翻译:多奖池彩票假说认为,随机初始化的神经网络包含多个子网络,其准确度可以与相同结构的完全训练模型相媲美。然而,当前的方法需要网络足够过参数化。在这项工作中,我们提出了两种最先进算法(Edge-Popup 和 Biprop)的改进版本,它可以在不增加存储成本或比例的情况下找到高精度的子网络。这种算法,即迭代权重重用,可以在随机初始化网络中识别重要权重的子集,以进行层内重用。根据实验结果,我们发现,这种方法可以提高较小的网络结构和更高的修剪率,通过“重用”现有权重来增加模型的稀疏度。除了迭代权重重用,我们还提出一个互补的发现:高精度的随机初始化子网络生成不同的掩码,尽管它们是用相同的超参数和修剪策略生成的。我们探索了这些掩码的景观,发现其变化很大。