Lexically constrained text generation aims to control the generated text by incorporating some pre-specified keywords into the output. Previous work injects lexical constraints into the output by controlling the decoding process or refining the candidate output iteratively, which tends to generate generic or ungrammatical sentences, and has high computational complexity. To address these challenges, we propose Constrained BART (CBART) for lexically constrained text generation. CBART leverages the pre-trained model BART and transfers part of the generation burden from the decoder to the encoder by decomposing this task into two sub-tasks, thereby improving the sentence quality. Concretely, we extend BART by adding a token-level classifier over the encoder, aiming at instructing the decoder where to replace and insert. Guided by the encoder, the decoder refines multiple tokens of the input in one step by inserting tokens before specific positions and re-predicting tokens with low confidence. To further reduce the inference latency, the decoder predicts all tokens in parallel. Experiment results on One-Billion-Word and Yelp show that CBART can generate plausible text with high quality and diversity while significantly accelerating inference.
翻译:长期限制的文本生成旨在控制生成的文本, 方法是将某些预指定的关键字纳入输出中。 先前的工作通过控制解码过程或对候选输出进行迭代精炼, 将词汇限制插入输出中, 这往往会产生通用或非语法的句子, 并且具有很高的计算复杂性。 为了应对这些挑战, 我们提议为在逻辑上受限制的文本生成控制 BART (CBART ) 。 CBART 利用预先培训的模型 BART, 将生成负担的一部分从解码器转移到编码器中, 将此任务分解成两个子任务, 从而改进句子质量 。 具体地说, 我们扩展 BART, 在编码器上添加一个符号级分类师, 目的是指示解码器在哪里替换和插入 。 在编码器导师的指导下, 解码器将输入的多个符号精细化一个步骤, 在特定位置之前插入符号, 并以低信任度重新配置符号。 为了进一步降低调度, 解译器预测所有符号的质量 。 我们扩展 BART 的实验结果, 在高图像中 显示高质量 。