The Once-For-All (OFA) method offers an excellent pathway to deploy a trained neural network model into multiple target platforms by utilising the supernet-subnet architecture. Once trained, a subnet can be derived from the supernet (both architecture and trained weights) and deployed directly to the target platform with little to no retraining or fine-tuning. To train the subnet population, OFA uses a novel training method called Progressive Shrinking (PS) which is designed to limit the negative impact of interference during training. It is believed that higher interference during training results in lower subnet population accuracies. In this work we take a second look at this interference effect. Surprisingly, we find that interference mitigation strategies do not have a large impact on the overall subnet population performance. Instead, we find the subnet architecture selection bias during training to be a more important aspect. To show this, we propose a simple-yet-effective method called Random Subnet Sampling (RSS), which does not have mitigation on the interference effect. Despite no mitigation, RSS is able to produce a better performing subnet population than PS in four small-to-medium-sized datasets; suggesting that the interference effect does not play a pivotal role in these datasets. Due to its simplicity, RSS provides a $1.9\times$ reduction in training times compared to PS. A $6.1\times$ reduction can also be achieved with a reasonable drop in performance when the number of RSS training epochs are reduced. Code available at https://github.com/Jordan-HS/RSS-Interference-CVPRW2022.
翻译:OFA 方法为将训练有素的神经网络模型运用于超级网络子网结构,在多个目标平台中部署经过培训的神经网络模型提供了极好的途径。经过培训后,可以从超级网(架构和经过培训的重量)中产生子网,直接部署到目标平台,几乎没有再培训或微调。为了培训子网人口,OFA使用名为“逐渐缩小”的新颖培训方法,旨在限制培训期间干扰的负面影响。据信,培训期间的更大干扰导致子网人口减少。在这项工作中,我们第二次审视这一干扰效应。令人惊讶的是,我们发现干预缓解战略不会对子网人口的总体业绩产生很大影响。相反,我们认为,在培训期间,子网结构选择偏差是一个更重要的方面。为了表明这一点,我们建议一种简单而有效的方法,即“随机缩小”的子网代码(RSS 20 ampl) (RSS),这种方法无法缓解干扰效应。尽管没有减轻,但RSS 能够对亚网下层人口进行比PS 降级(PS) Seral-S) 在四度中, 降低其核心数据的作用是降低。