Integer quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation, using reduced integer precision. It plays a significant role in the efficient deployment and execution of machine learning (ML) systems, reducing memory consumption and leveraging typically faster computations. In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems. Our quantization strategy is accurate (e.g. works well with quantization post-training), efficient and fast to execute (utilizing 8 bit integer weights and mostly 8 bit activations), and is able to target a variety of hardware (by leveraging instructions sets available in common CPU architectures, as well as available neural accelerators).
翻译:神经网络的整数化可以定义为使用降低的整数精确度,高精确度计算神经网络配方的近似值,它在机器学习系统的有效部署和实施、减少记忆消耗和利用一般更快的计算方面起着重要作用。在这项工作中,我们提出了长期短期内存(LSTM)神经网络表层的全数量化战略,而长期短期内存(LSTM)神经网络表层本身也是许多生产ML系统的基础。我们的定量战略是准确的(例如,在培训后的定量化方面运作良好),高效和快速地执行(利用8位整数重量和大部分8位激活),并且能够针对各种硬件(通过利用共同的CPU结构中现有的指示套以及现有的神经加速器)。