Although non-autoregressive models with one-iteration generation achieve remarkable inference speed-up, they still fall behind their autoregressive counterparts in prediction accuracy. The non-autoregressive models with the best accuracy currently rely on multiple decoding iterations, which largely sacrifice the inference speed of non-autoregressive models. Inspired by the way of learning word dependencies in autoregressive and iterative-decoding models, we propose Glancing Transformer (GLAT) with a glancing language model (GLM), which learns to capture the word dependency gradually. Experiments on three benchmarks demonstrate that our approach can significantly improve the accuracy of non-autoregressive models without multiple decoding iterations. In particular, GLAT achieves state-of-the-art results among non-iterative models and even outperforms top iterative counterparts in some specific benchmarks.
翻译:虽然具有一地相继生成的非航空模型在预测准确性方面达到显著的推论速度,但它们仍然落后于自动递减模型,目前最准确的非航空模型依赖多种解码迭代法,这在很大程度上牺牲了非航空模型的推论速度。受在自动递减和迭代解码模型中学习单词依赖性的方式的启发,我们建议采用一种拼凑语言模型(GLM),以学习逐步捕捉单词依赖性。对三个基准的实验表明,我们的方法可以显著提高非航空模型的精确性,而无需多次解码迭代法。特别是,GLAT在非回归模型中取得了最新的结果,甚至在某些具体基准中超越了顶层迭代方。