作为无人监督的机器翻译而生成的参数 (Paraphrase Generation as Unsupervised Machine Translation)

In this paper, we propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT) based on the assumption that there must be pairs of sentences expressing the same meaning in a large-scale unlabeled monolingual corpus. The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT models using pairs of these clusters. Then based on the paraphrase pairs produced by these UMT models, a unified surrogate model can be trained to serve as the final \sts model to generate paraphrases, which can be directly used for test in the unsupervised setup, or be finetuned on labeled datasets in the supervised setup. The proposed method offers merits over machine-translation-based paraphrase generation methods, as it avoids reliance on bilingual sentence pairs. It also allows human intervene with the model so that more diverse paraphrases can be generated using different filtering criteria. Extensive experiments on existing paraphrase dataset for both the supervised and unsupervised setups demonstrate the effectiveness the proposed paradigm.

翻译：在本文中,我们提议了一种新版本版本生成模式,将这一任务作为不受监督的机器翻译(UMT)来对待,所依据的假设是,在大型无标签的单语库中,必须配对表示同样含义的句子。拟议的模式首先将一个大无标签的文体分成多个组群,并用这些组群中的对数来培训多种UMT模型。然后,根据这些UMT模型产生的副词组对数组,可以培训一个统一的代词模型,作为生成副词组的最后模型,该模型可以直接用于在不受监督的设置中测试,或者在受监督的设置中,对标签的数据集进行微调。拟议的方法优于基于机器的译义生成方法,因为它避免了对双语句组的依赖。它还允许人类对模型进行干预,以便使用不同的过滤标准产生更加多样化的文句子。对现有的原词组的参数组进行广泛的实验,展示了拟议的范式的有效性。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日