Current dynamic networks and dynamic pruning methods have shown their promising capability in reducing theoretical computation complexity. However, dynamic sparse patterns on convolutional filters fail to achieve actual acceleration in real-world implementation, due to the extra burden of indexing, weight-copying, or zero-masking. Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden. Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate that comprises an attention head and a slimming head to predictively adjust network width with negligible extra computation cost. To ensure generality of each candidate architecture and the fairness of gate, we propose a disentangled two-stage training scheme inspired by one-shot NAS. In the first stage, a novel training technique for weight-sharing networks named In-place Ensemble Bootstrapping is proposed to improve the supernet training efficacy. In the second stage, Sandwich Gate Sparsification is proposed to assist the gate training by identifying easy and hard samples in an online way. Extensive experiments demonstrate our DS-Net consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods by a large margin (up to 5.9%). Typically, DS-Net achieves 2-4x computation reduction and 1.62x real-world acceleration over ResNet-50 and MobileNet with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net .
翻译:目前动态网络和动态修剪方法显示,它们具有降低理论计算复杂性的良好能力。然而,由于指数化、权重复印或零制模的超重负担,在现实世界的实施中,动态的蒸发模式未能实现实际加速。在这里,我们探索一个动态网络缩水制度,名为动态缩水网络(DS-Net),其目的是通过动态调整不同投入的测试时间网络过滤器数目,实现良好的硬件效率,同时将过滤器静态和连续存储在硬件中,以防止额外负担。我们DS-Net的动态稀薄模式无法在现实世界的实施中实现真正的加速。我们的拟议双向型动态网络的快速推断能力,包括一个关注头和一个苗头,以微不足道的额外计算成本来预测网络的宽度。为了确保每个候选人架构的普遍性和门的公平性,我们提议了一个由一发式的NASDS所启发的分解的两阶段培训计划。 在第一阶段,一个名为Entialble Striforal 和Settrical Stabitriew 网络的最小化方法, 提议通过Sliversal Stabial Streal Streal Stabiliflestring Stabiduction Salistring Stabidududustring Stative Stability Stabidustration-