Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision. In this paper, we formalize the implicit similarity function induced by this approach, and show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation. Based on these insights, we design an alternative similarity metric that mitigates this issue by requiring the entire translation distribution to match, and implement a relaxation of it through the Information Bottleneck method. Our approach incorporates an adversarial term into MT training in order to learn representations that encode as much information about the reference translation as possible, while keeping as little information about the input as possible. Paraphrases can be generated by decoding back to the source from this representation, without having to generate pivot translations. In addition to being more principled and efficient than round-trip MT, our approach offers an adjustable parameter to control the fidelity-diversity trade-off, and obtains better results in our experiments.
翻译:圆曲机器翻译(MT)是通用的参数生成选择,它利用现成的平行公司进行监管。 在本文中,我们正式确定由这种方法引发的隐含相似功能,并表明它很容易被非参数配对共享单一的模糊翻译。基于这些洞察力,我们设计了替代相似性指标,通过要求整个翻译分布匹配来缓解这一问题,并通过信息瓶颈方法放松这一问题。我们的方法将对抗术语纳入MT培训,以便学习尽可能多地编码参考翻译信息的演示,同时尽可能少保存关于投入的信息。 参数可以通过从该表达中解码回源产生,而不必产生引力翻译。 我们的方法除了比圆曲MT更有原则和效率之外,还提供了一个可调整的参数,以控制真实的多样化贸易,并在我们的实验中获得更好的结果。