与AI 模拟效率工具包(AIMET)进行神经网络量化 (Neural Network Quantization with AI Model Efficiency Toolkit (AIMET))

While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additional noise it induces can lead to accuracy degradation. In this white paper, we present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization and thus drive the broader AI ecosystem towards low latency and energy-efficient inference. AIMET provides users with the ability to simulate as well as optimize PyTorch and TensorFlow models. Specifically for quantization, AIMET includes various post-training quantization (PTQ, cf. chapter 4) and quantization-aware training (QAT, cf. chapter 5) techniques that guarantee near floating-point accuracy for 8-bit fixed-point inference. We provide a practical guide to quantization via AIMET by covering PTQ and QAT workflows, code examples and practical tips that enable users to efficiently and effectively quantize models using AIMET and reap the benefits of low-bit integer inference.

翻译：虽然神经网络在许多机器学习应用中已经发展了前沿,但它们的计算成本往往很高。降低神经网络推断的功率和纬度对于将现代网络纳入具有严格功率和计算要求的边缘装置至关重要。神经网络量度是实现这些节约的最有效方法之一,但它引起的额外噪音可能导致精确度下降。在本白皮书中,我们利用AI 模型效率工具包(AIMET)对神经网络量化作了概述。AIMET是一个最新量化和压缩算法图书馆,旨在便利模型优化所需的努力,从而将更广泛的AI生态系统推向低弹性和节能的推论。AIMET为用户提供了模拟和优化PyTorrch和TensorFlow模型的能力。具体来说,AIMET包括各种培训后量化(PQ,参考第4章)和二次量化-认知培训(QAT,参考第5章),这些技术保证了8比的固定和高能效的人工智能生态系统接近浮动精确度的精确度,我们提供了一种技术,通过AIT的定额和定额模型,通过AIT 提供可有效进行模拟和定额流化。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【硬核书】金融数学C++编程，411页pdf，C++ for Financial Mathematics

专知会员服务

75+阅读 · 2020年4月6日