Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs that include layer normalization and attention on integer-only arithmetic is still an open problem. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear approximation of activations, to serve a wide range of RNNs on various applications. The proposed method is proven to work on RNN-based language models and automatic speech recognition. Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2\times$, and reduces the model size by $4\times$.
翻译:经常神经网络 (RNN) 用于许多真实世界文本和语音应用, 包括复发、 指数基激活、 门互动、 显示可正常化、 双向依赖性、 注意力等复杂模块。 这些元素之间的相互作用阻止了这些元素在无显著性能下降的情况下运行于仅整数操作中。 部署包含层正常化和仅整数算术关注的 RNN 仍然是一个尚未解决的问题。 我们为获取高度精确的整数单一经常神经网络( iRNN) 提供了一个量化认知培训方法。 我们的方法支持了启动的层正常化、 注意和适应性的片度直线近似值, 以便在各种应用中服务于一系列的 RNN 。 提议的方法证明可以用于基于全精度语言的模型和自动语音识别 。 我们的 iRNNN 将类似性能作为全精度对应软件, 智能手机的部署可以提高运行时间性能2美元, 并将模型的大小减少 4\ 次 美元 。