Idiomatic expressions (IEs) play an essential role in natural language. In this paper, we study the task of idiomatic sentence paraphrasing (ISP), which aims to paraphrase a sentence with an IE by replacing the IE with its literal paraphrase. The lack of large-scale corpora with idiomatic-literal parallel sentences is a primary challenge for this task, for which we consider two separate solutions. First, we propose an unsupervised approach to ISP, which leverages an IE's contextual information and definition and does not require a parallel sentence training set. Second, we propose a weakly supervised approach using back-translation to jointly perform paraphrasing and generation of sentences with IEs to enlarge the small-scale parallel sentence training dataset. Other significant derivatives of the study include a model that replaces a literal phrase in a sentence with an IE to generate an idiomatic expression and a large scale parallel dataset with idiomatic/literal sentence pairs. The effectiveness of the proposed solutions compared to competitive baselines is seen in the relative gains of over 5.16 points in BLEU, over 8.75 points in METEOR, and over 19.57 points in SARI when the generated sentences are empirically validated on a parallel dataset using automatic and manual evaluations. We demonstrate the practical utility of ISP as a preprocessing step in En-De machine translation.
翻译:在本文中,我们研究的是语言语言参数学的任务。在本文中,我们研究的是语言语言句参数学(ISP)的任务,其目的是用IE来用IE来换一个句子,用其字面句子来取代IE。缺乏具有语言-语言-平行句子的大型组合体是这项任务面临的主要挑战,我们考虑两种不同的解决办法。首先,我们建议对ISP采取不受监督的方法,利用IE的背景信息和定义,不需要平行的句子培训。第二,我们建议采用一种由IE来用回译法用IE换一个句子,用IE换字句用其字句取代IE。研究的其他重要衍生物包括一种模型,用IE取代一句中的字句,以产生一种语言表达和与iEndical/语言句子的大规模平行数据集。在实际翻译中,使用MASTER的5.16点的相对增益,在SAR AVERA中, MAEU ASRA AS AS ASRA ASRA ASU ASU ASU ASU ASU ASU ASUTIOL ASU 上, ASU ASU ASU ASU ASU ASU 58757 的自动翻译超过516点。