Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is: can we "transform" the winning ticket found in one network to another with a different architecture, yielding a winning ticket for the latter at the beginning, without re-doing the expensive IMP? Answering this question is not only practically relevant for efficient "once-for-all" winning ticket finding, but also theoretically appealing for uncovering inherently scalable sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets). Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP. We have also extensively compared E-LTH with pruning-at-initialization and dynamic sparse training methods, as well as discussed the generalizability of E-LTH to different model families, layer types, and across datasets. Code is available at https://github.com/VITA-Group/ElasticLTH.
翻译:Lottery Ticket Hypothesis (LTH) 引起人们热衷于识别稀少的可训练亚网络或胜出票, 这些小网络可以被孤立地训练, 实现与完整模型相似的或更好的业绩。 尽管做出了许多努力, 最能有效识别胜出票的方法仍然是 自动磁盘式的普鲁宁( IMP ), 它计算成本昂贵, 并且要对每个不同的网络进行彻底操作。 一个自然的问题就是: 我们能否将一个网络中找到的胜出票“ 转换 ” 到另一个有不同结构的网络, 在开始时为后者带来优胜出票, 而不用再做昂贵的 IMP? 回答这个问题不仅对“ 向所有人换码” 的胜出票有效, 而且从理论上来说, 在网络中解码式中找出固有的稀释模式- 10 和 图像网, 提出各种策略, 将同一模式家族的不同网络( 例如, ResNets) 找到的得分数, 在开始, 并且通过直线式的网络中, 直线式的网络中可以直接地( 我们用直线路路路路路路端的网络 ) 和直线式的再讨论, 可以对一个直线式的网络进行。