神经网络压缩和规范化的量化和审慎 (Quantisation and Pruning for Neural Network Compression and Regularisation)

Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks through network pruning and quantisation. We examine their efficacy on large networks like AlexNet compared to recent compact architectures: ShuffleNet and MobileNet. Our results show that pruning and quantisation compresses these networks to less than half their original size and improves their efficiency, particularly on MobileNet with a 7x speedup. We also demonstrate that pruning, in addition to reducing the number of parameters in a network, can aid in the correction of overfitting.

翻译：深神经网络通常在计算上过于昂贵,无法实时运行消费级硬件和低功率设备。在本文中,我们通过网络运行和量化调查减少神经网络的计算和内存要求。我们对照最近的紧凑结构(ShuffleNet 和 MobileNet ), 审视了亚历克斯Net 等大型网络的功效。我们的结果表明,运行和量化将这些网络压缩到不到其最初规模的一半,提高了其效率,特别是移动网络的效能,并加速了7x速度。我们还表明,除了减少网络参数的数量外,运行和量化还能帮助校正超装。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【干货】模型不work怎么办？大神Josh Tobin141页PPT告诉你怎么改模型

专知会员服务

30+阅读 · 2019年11月21日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日