附属网络:网络压缩应用培训范例 (Adjoined Networks: A Training Paradigm with Applications to Network Compression)

from arxiv, Published at AAAI 2022 Spring Symposium on Machine Learning and Knowledge Engineering for Hybrid Intelligence Code available at: https://github.com/utkarshnath/Adjoint-Network.git

Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.

翻译：当我们希望在生产和/或边缘设备中部署大型、强大模型和/或边缘设备时,在保持准确性的同时压缩深层神经网络十分重要。实现这一目标的一个共同技术是知识蒸馏。通常, 静态预设教师(大型基础网络)的产出被用作软标签, 用于培训和向学生(或较小)网络传递信息。在本文中, 我们引入了Adjoined网络, 或AN, 这是一种学习模式, 既培训原始基础网络, 也培训较小的压缩网络。在我们的培训方法中, 小网络的参数在基地和压缩网络中共享。我们可以同时使用培训模式, 压缩学生网络(学生网络) 和规范(教师网络) 任何架构。在本文中, 我们侧重于用于计算机愿景任务的流行的CNN架构。我们对各种大型数据集的培训模式进行广泛的实验性评估。使用ResNet-50作为基础网络, 仅达到1. 8M 级最高1 级的精度和1.6 GFLLOP 在图像网络设置中共享。我们进一步提议, 以可变式 Adroimalal commal Instreal network, 将每级的精度的精度提升Adal AS- hestal train 10:

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日