Lottery tickets (LTs) is able to discover accurate and sparse subnetworks that could be trained in isolation to match the performance of dense networks. Ensemble, in parallel, is one of the oldest time-proven tricks in machine learning to improve performance by combining the output of multiple independent models. However, the benefits of ensemble in the context of LTs will be diluted since ensemble does not directly lead to stronger sparse subnetworks, but leverages their predictions for a better decision. In this work, we first observe that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs. Encouraged by this observation, we further propose an alternative way to perform an 'ensemble' over the subnetworks identified by iterative magnitude pruning via a simple interpolating strategy. We call our method Lottery Pools. In contrast to the naive ensemble which brings no performance gains to each single subnetwork, Lottery Pools yields much stronger sparse subnetworks than the original LTs without requiring any extra training or inference cost. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that our method achieves significant performance gains in both, in-distribution and out-of-distribution scenarios. Impressively, evaluated with VGG-16 and ResNet-18, the produced sparse subnetworks outperform the original LTs by up to 1.88% on CIFAR-100 and 2.36% on CIFAR-100-C; the resulting dense network surpasses the pre-trained dense-model up to 2.22% on CIFAR-100 and 2.38% on CIFAR-100-C.
翻译:彩票( LTs) 能够发现精密和稀疏的亚网络, 这些网络可以被孤立地训练成与稠密网络的性能匹配。 相平行的是, 在机器学习中, 结合多种独立模型的输出来提高性能的最古老的时间证明技巧之一 。 然而, 在彩票( LTs) 中, 串通的好处将会被淡化, 因为串通不会直接导致更强的稀疏子网络, 而是利用它们的预测来做出更好的决定 。 在这项工作中, 我们首先观察到, 直接平均平均附近所学的亚网络的重量可以大大提升LTs的性能。 在这项观察的鼓励下, 我们进一步提出了一种在机器网络上进行“连线”的替代方法。 然而, 彩票( Order) 22 的“连线 ” 。 彩票( Orights), 彩票(lickral) 彩票) 和“Oral-ral-rassal ” (O), 显示我们在100-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-l) 的成绩(我们100-ral-ral-l) 方法上的显著的成绩, 和在100- 和在100-ral-ral-l-rass-ral-ral-l-l-l-lx-l-lx-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l