Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility between absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypothesis at each step, which could be costly. We design a novel reusable positional encoding scheme for Insertion Transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various text generation tasks demonstrate the effectiveness of FPE, which leads to floating-point operation reduction and latency improvements on batched decoding.
翻译:自动递减神经序列模型在文本生成任务中被证明是有效的。 但是,它们的左对右解码程序防止了生成平行。插入变换器( Stern等人, 2019年)是一个有吸引力的替代方法,允许在单代步骤中输出多个符号。然而,由于绝对定位编码和基于插入的生成方法不相容,它需要刷新生成的每个步骤部分假设中每个符号的编码,这可能会是昂贵的。我们为“插入变换器”设计了一个新型的可重复使用的定位编码方案,称为“FPE”,允许重新使用在以往步骤中计算的表达方式。关于各种文本生成任务的经验性研究表明了FPE的有效性,这导致了浮点操作的减少和分批解码的拉长改进。