The exponential growth in numbers of parameters of neural networks over the past years has been accompanied by an increase in performance across several fields. However, due to their sheer size, the networks not only became difficult to interpret but also problematic to train and use in real-world applications, since hardware requirements increased accordingly. Tackling both issues, we present a novel approach that either drops a neural network's initial weights or inverts their respective sign. Put simply, a network is trained by weight selection and inversion without changing their absolute values. Our contribution extends previous work on masking by additionally sign-inverting the initial weights and follows the findings of the Lottery Ticket Hypothesis. Through this extension and adaptations of initialization methods, we achieve a pruning rate of up to 99%, while still matching or exceeding the performance of various baseline and previous models. Our approach has two main advantages. First, and most notable, signed Supermask models drastically simplify a model's structure, while still performing well on given tasks. Second, by reducing the neural network to its very foundation, we gain insights into which weights matter for performance.
翻译:过去几年来,神经网络参数的指数增长伴随着若干领域性能的提高。然而,由于这些网络的规模巨大,不仅难以解释,而且由于硬件需求相应增加,在实际应用中培训和使用也存在问题。处理这两个问题,我们提出了一个新颖的办法,即降低神经网络的初始重量或颠倒各自的标志。简而言之,一个网络经过体重选择和反向培训,而不会改变其绝对值。我们的贡献扩大了以前掩盖网络的工作,增加了初始重量的标记,并遵循了“彩票假曲”的研究结果。通过这一扩展和初始化方法的调整,我们实现了高达99%的速率,同时仍然与各种基线和以往模型的性能相匹配或超过。我们的方法有两个主要优点。首先,最值得注意的是,已经签署的超级数学模型极大地简化了模型的结构,同时仍然很好地完成了既定的任务。第二,通过将神经网络缩小到其基础,我们获得了对哪些重量的洞察。