Syntactically controlled paraphrase generation has become an emerging research direction in recent years. Most existing approaches require annotated paraphrase pairs for training and are thus costly to extend to new domains. Unsupervised approaches, on the other hand, do not need paraphrase pairs but suffer from relatively poor performance in terms of syntactic control and quality of generated paraphrases. In this paper, we demonstrate that leveraging Abstract Meaning Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation. Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), separately encodes the AMR graph and the constituency parse of the input sentence into two disentangled semantic and syntactic embeddings. A decoder is then learned to reconstruct the input sentence from the semantic and syntactic embeddings. Our experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches. We also demonstrate that the paraphrases generated by AMRPG can be used for data augmentation to improve the robustness of NLP models.
翻译:近些年来,协同控制的原句生成已成为一个新的研究方向。大多数现有方法都需要附加说明的副句式组合用于培训,因此推广到新领域的成本很高。另一方面,未经监督的方法不需要副句式配对,但在合成控制和质量上,其生成的原句的性能相对较差。在本文中,我们表明,利用抽象代表法可以大大改善未经监督的合成控制副句生成的性能。我们提议的模型,即AMR-强化的原句生成器(AMRPG),单独编码AMR图和输入句的用户分析,将其分为两个不相交的语义和合成嵌入。然后学习解码器,从语义和合成嵌入中重建输入句。我们的实验表明,与现有的未受监督的方法相比,AMRPG生成的参数在定量和定性上都更准确的合成控制原句式。我们还表明,AMRGPG生成的稳健度模型可以用来改进数据。