Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting Erd\"os-R\'enyi (ER) random graphs can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms struggle to outperform them, even though the random baselines do not rely on computationally expensive pruning-training iterations but can be drawn initially without significant computational overhead. We offer a theoretical explanation of how such ER masks can approximate arbitrary target networks if they are wider by a logarithmic factor in the inverse sparsity $1 / \log(1/\text{sparsity})$. While we are the first to show theoretically and experimentally that random ER source networks contain strong lottery tickets, we also prove the existence of weak lottery tickets that require a lower degree of overparametrization than strong lottery tickets. These unusual results are based on the observation that ER masks are well trainable in practice, which we verify in experiments with varied choices of random masks. Some of these data-free choices outperform previously proposed random approaches on standard image classification benchmark datasets.
翻译:随机面罩定义出乎意料的稀有神经网络模型。 由此得出的Erd\"os- R\' enyi (ER) 随机图往往可以与稠密的建筑和最先进的彩票操纵算法竞争, 试图超越这些模型, 尽管随机基线并不依赖于计算成本昂贵的修剪训练迭代, 但最初可以在没有重大计算间接费用的情况下绘制。 我们从理论上解释了如果这种ER 遮盖在反夸度 1 / / log (1/\ text{ parsity ) $ 中的对数系数扩大了任意目标网络的范围, 那么这些ER 掩码如何能接近任意的目标网络。 虽然我们首先在理论上和实验上显示随机的ER源网络含有强的彩票, 但我们也证明存在微弱的彩票, 需要比强的彩票低程度的超度。 这些不寻常的结果是基于这样的观察,即ER掩码在实际中可以很好地进行训练, 我们通过随机掩码的实验来验证。 其中一些数据选择超越了先前提出的标准图像基准分类方法。