In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC). SAD optimizes the online inference efficiency for GEC by two innovations: 1) it aggressively decodes as many tokens as possible in parallel instead of always decoding only one token in each step to improve computational parallelism; 2) it uses a shallow decoder instead of the conventional Transformer architecture with balanced encoder-decoder depth to reduce the computational cost during inference. Experiments in both English and Chinese GEC benchmarks show that aggressive decoding could yield the same predictions as greedy decoding but with a significant speedup for online inference. Its combination with the shallow decoder could offer an even higher online inference speedup over the powerful Transformer baseline without quality loss. Not only does our approach allow a single model to achieve the state-of-the-art results in English GEC benchmarks: 66.4 F0.5 in the CoNLL-14 and 72.9 F0.5 in the BEA-19 test set with an almost 10x online inference speedup over the Transformer-big model, but also it is easily adapted to other languages. Our code is available at https://github.com/AutoTemp/Shallow-Aggressive-Decoding.
翻译:在本文中,我们提议浅浅侵蚀分解(SAD), 以提高变异器对瞬时显性错误校正(GEC)的在线推断效率。 SAD通过两项创新优化GEC的在线推断效率:1) 大力平行解码尽可能多的代号,而不是在每一步中总是解码一个代号,以改进计算平行;2) 使用浅解码器,而不是传统的具有平衡编码器-代码深度的变异器结构,以减少推断期间的计算成本。 英国和中国GEC基准的实验显示,侵略解码可以产生贪婪解码的相同预测,但可以大大加速在线推断。 它与浅解码器的结合可以提供更高级的在线引力速度,而不会造成质量损失。 我们的方法不仅允许一种单一的模式来实现英国GEC的状态-艺术结果基准:66.4 F0.5 在CONL-14和72.9 F0.5 BEA-19测试中, 也很容易在10/SEBA-IA 测试中进行升级。