Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, of training, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is: can we "transform" the winning ticket found in one network to another with a different architecture, yielding a winning ticket for the latter at the beginning, without re-doing the expensive IMP? Answering this question is not only practically relevant for efficient "once-for-all" winning ticket finding, but also theoretically appealing for uncovering inherently scalable sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets). Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP. We have also thoroughly compared E-LTH with pruning-at-initialization and dynamic sparse training methods, and discuss the generalizability of E-LTH to different model families, layer types, or across datasets. Code is available at https://github.com/VITA-Group/ElasticLTH.
翻译:彩票假冒( LTH) 引起人们的极大关注, 找出缺乏可训练的亚网络, 或者赢得机票, 培训可以被孤立地训练, 以便实现与完整模型相比的类似或更好的业绩。 尽管做出了许多努力, 确定这种优票的最有效方法仍然是 自动磁盘式的普鲁宁( IMP ), 它计算成本很高, 并且要对不同的网络进行彻底的测试。 一个自然的问题就是: 我们能否“ 将一个网络中找到的胜票转换”到另一个有不同结构的网络, 在开始的时候给后者开一张赢票, 而不用再直接重做昂贵的IMP? 回答这个问题不仅对“ 自动赢票” 的得票有效相关, 而且在理论上吸引人们发现网络中固有的可缩放模式。 我们在CFAR- 10 和图像网上进行广泛的实验, 并提出一系列战略, 将同一模式家庭的不同网络中找到的优票转换( e. g. ResNets ) 。 根据这些结果, 我们从一个更深层次的网络中可以进行更深层次的版本的版本的版本的版本, 和直线路路机级的后端的版本, 也可以的版本, 也可以在另一个的版本中可以进行。