Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to enhance its training sufficiency and alleviate the disturbance. Experimental results show our scaled networks enjoy significant performance superiority on various FLOPs, but with at least 2.53x reduction on search cost. Codes are available at https://github.com/luminolx/ScaleNet.
翻译:最近,社区在模型规模上日益重视模型规模的扩大,并为发展规模广的模范家庭作出了贡献。目前的方法要么只是采用一次性的NAS方法来构建一个非结构性和不可扩展的模范家庭,要么依靠一个手动但固定的模范战略来扩大一个不必要的最佳模型规模。在本文中,我们将两个组成部分连接起来,并提议SOSNet联合搜索基建模型和规模大的模范战略,这样规模大的模范能够产生更有希望的性能。具体地说,我们设计了一个超级超级网络,以体现不同范围(如FLOPs)的模范。然后,可以通过基于Markov链的进化算法与基模一起互动学习比例战略,并推广到甚至更大的模型。为了获得一个体面的超级网络,我们设计了一个等级抽样战略,以加强其培训的充足性和缓解干扰。实验结果显示,我们规模大的网络在各种FLOPs上享有显著的性能优势,但至少减少2.53x的搜索成本。代码可在https://github.com/luminolololx/sessNet上查到。