Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility between absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypothesis at each step, which could be costly. We design a novel reusable positional encoding scheme for insertion transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various text generation tasks demonstrate the effectiveness of FPE, which leads to floating-point operation reduction and latency improvements on batched decoding.
翻译:自动递减神经序列模型在文本生成任务中被证明是有效的。 但是,它们的左对右解码程序防止了生成平行。插入变换器( Stern等人, 2019年)是一个有吸引力的替代方法,允许在一代人的过程中输出多个符号。然而,由于绝对定位编码和插入生成方法不兼容,它需要刷新每个步骤生成的部分假设中每个符号的编码,因为每个步骤都可能代价高昂。我们为插入变换器设计了一个新型的可重复使用的定位编码方法,称为分形定位编码(FPE),允许重新使用在前几个步骤中计算的表达方式。关于各种文本生成任务的经验性研究表明了FPE的有效性,这导致了浮点操作的减少和分解码的拉长。