We propose to tackle conditional text generation tasks, especially those which require generating formulaic text, by splicing together segments of text from retrieved "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors in an encoder-decoder setting but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text (i.e., by inserting or replacing them) to form an output. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way allow for interpretable table-to-text or headline generation that is competitive with neighbor-based token-level policies on automatic metrics, though on all but one dataset neighbor-based policies underperform a strong neighborless baseline. In all cases, however, generating by splicing is faster.
翻译:我们建议处理有条件的文本生成任务,特别是那些要求生成公式文本的任务,方法是将检索到的“邻居”源目标对的文本部分相混合。与最近的工作不同,即以编码器-编码器设置对检索到的邻居的条件,但产生逐字逐句、左对右的文本生成,我们学习了一种直接操纵邻居文本部分(即通过插入或替换)形成输出的政策。培训这种政策的标准技术要求为每一代人产生一个符咒,而且我们证明找到最短的文本可以减少到在特定加权的无上下文语法下进行分割。我们发现,通过这种方式学习的政策可以使可解释的表格到文字或标题的一代人与以邻居为基础的代号政策具有竞争力,在自动计量方面,尽管除了一个数据集之外,以邻居为基础的政策都处于一个强大的无邻基线之下。然而,在所有情况中,通过拼写方式生成的生成速度更快。