实现更准确、更高效、更准确、更高效的取舍:分化和共同培训 (Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training)

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, \ie, achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments. The code is available at \url{https://github.com/FreeformRobotics/Divide-and-Co-training}.

翻译：随着宽度的提高,神经网络的宽度将必然增加模型能力。但是, 网络的性能不会随着宽度而线性地改善, 并且很快会饱和。在这种情况下, 我们争辩说, 增加网络( 整体性能) 能够实现比纯粹扩大宽度更准确- 效率的权衡。为了证明这一点, 一个大网络在其参数和规范化组成部分方面被分成几个小网络。这些小网络中的每个小网络都有最初参数的一小部分。然后, 我们一起训练这些小网络, 让他们看到对同一数据的不同观点来增加它们的多样性。在这个共同培训过程中, 网络也可以相互学习。结果, 小网络比没有或没有额外参数的大网络或 FLOPs,\ie, 能够实现更好的准确- 效率交易。小网络也可以通过同时运行实现比大网络更快的推断速度。以上所有这一切都表明, 网络的数量是模型缩放的一个新层面。我们通过广泛的实验, 用8个不同的神经性结构来验证我们关于共同基准的争论。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日