Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a highly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to both streamlining the computation of Transformer layers and reducing memory footprint. LightSeq supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with FasterTransformer, a concurrent CUDA implementation. The code has be released publicly in https://github.com/bytedance/lightseq.
翻译:变异器及其变异器在自然语言处理方面取得了巨大成功。 由于变异器模型规模巨大, 为这些模型服务是实行工业应用的挑战。 在本文中, 我们提出 LightSeq, 这是变异器大家庭中模型的高效推断库 。 LightSeq 包括一系列GPU优化技术, 以简化变异器层的计算和减少记忆足迹。 LightSeq 支持使用PyTorch 和 Tensorflow 培训的模型。 标准机器翻译基准的实验结果表明, LightSeq 实现了14x速度, 与 TensorFlow 和 1. 4x 速度相比, 与 Neatter Transforent( CUDA 的同步实施) 相比, 该代码已经公开发布在 https://github.com/bytedance/lightseq。