Paraphrase generation is a longstanding NLP task that has diverse applications for downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to address this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with weak supervision data. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model, i.e., BART, on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
翻译:参数生成是一项长期的NLP任务,对下游的NLP任务有多种应用,但是,现有努力的效力主要取决于大量贴有金标签的数据。虽然已提出未经监督的努力解决这一问题,但由于缺乏监督信号,它们可能无法产生有意义的参数。在这项工作中,我们超越了现有的模式,提出了一种创新办法,以产生质量高的参数,而监督数据薄弱。具体地说,我们通过下列方式解决了薄弱的参数生成问题:(1) 通过检索基于假冒参数扩展获得大量微弱标签的平行句子;(2) 开发一个元学习框架,逐步挑选有价值的样本,以微调预先培训的语言模型,即用于发送式参数任务。我们证明我们的方法与现有的未监督的方法取得了显著的改进,甚至与受监督的状态的功能相类似。