In the paraphrase generation task, source sentences often contain phrases that should not be altered. Which phrases, however, can be context dependent and can vary by application. Our solution to this challenge is to provide the user with explicit tags that can be placed around any arbitrary segment of text to mean "don't change me!" when generating a paraphrase; the model learns to explicitly copy these phrases to the output. The contribution of this work is a novel data generation technique using distant supervision that allows us to start with a pretrained sequence-to-sequence model and fine-tune a paraphrase generator that exhibits this behavior, allowing user-controllable paraphrase generation. Additionally, we modify the loss during fine-tuning to explicitly encourage diversity in model output. Our technique is language agnostic, and we report experiments in English and Chinese.
翻译:发源句通常含有不应更改的短语。 但是, 哪些短语可能取决于上下文, 并可能因应用而变化。 我们的解决方案是向用户提供清晰的标签, 可以在文本任意部分周围放置“ 不要改变我 ” 。 当生成一个副句子时, 模型学会将这些短语明确复制到输出中。 这项工作的贡献是一种新型的数据生成技术, 使用远程监督, 使我们能够从事先训练的序列到序列模型开始, 并微调一个演示这种行为的参数生成器, 允许用户控制参数生成。 此外, 我们在微调过程中修改损失, 以明确鼓励模型输出的多样性 。 我们的技术是语言的不可知性, 我们用英语和中文报告实验 。