ParaCotta:来自最多样化翻译样本的合成多语种多语种参数 (ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair) - 专知论文

会员服务 ·

0

Pair · 多样性 · 样本 · 语义相似度 · 束搜索 ·

2022 年 5 月 10 日

ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

翻译：ParaCotta:来自最多样化翻译样本的合成多语种多语种参数

Alham Fikri Aji,Tirana Noor Fatyanosa,Radityo Eko Prasojo,Philip Arthur,Suci Fitriany,Salma Qonitah,Nadhifa Zulfa,Tomi Santoso,Mahendra Data

from arxiv, 10 pages, 3 figures, 6 tables. Accepted at PACLIC 2021. (ACL Anthology link: https://aclanthology.org/2021.paclic-1.56/)

We release our synthetic parallel paraphrase corpus across 17 languages: Arabic, Catalan, Czech, German, English, Spanish, Estonian, French, Hindi, Indonesian, Italian, Dutch, Romanian, Russian, Swedish, Vietnamese, and Chinese. Our method relies only on monolingual data and a neural machine translation system to generate paraphrases, hence simple to apply. We generate multiple translation samples using beam search and choose the most lexically diverse pair according to their sentence BLEU. We compare our generated corpus with the \texttt{ParaBank2}. According to our evaluation, our synthetic paraphrase pairs are semantically similar and lexically diverse.

翻译：我们通过17种语言(阿拉伯语、加泰罗尼亚语、捷克语、德语、英语、西班牙语、爱沙尼亚语、法语、印地语、印度尼西亚语、意大利语、荷兰语、罗马尼亚语、俄语、瑞典语、越南语和汉语)发布合成平行副句,我们的方法只依靠单语数据和神经机器翻译系统来生成副句,因此应用简便。我们利用光束搜索生成多个翻译样本,并根据BLEU的句子选择最有法则多样性的一对。我们把我们生成的副句子与\ textt{ParaBank2}作比较。根据我们的评估,我们的合成副句子在语义上相似,在词汇上也各不相同。

0

相关内容

Pair

两人亲密社交应用，官网： http://trypair.com/

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

应用斑马鱼模型研究CERKL突变引起视网膜变性的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

垂直磁各向异性半金属铁磁电极到半导体自旋注入的研究

国家自然科学基金

0+阅读 · 2014年12月31日

一氧化氮调控植物细胞周期的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

铁电性分子磁体的研究

国家自然科学基金

0+阅读 · 2008年12月31日

BertNet: Harvesting Knowledge Graphs from Pretrained Language Models

Arxiv

0+阅读 · 2022年6月28日

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Arxiv

0+阅读 · 2022年6月28日

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

Arxiv

0+阅读 · 2022年6月25日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

语义相似度

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的事件抽取：方法、模态与未来展望的全面综述

美海军作战管理系统：变革战场空间的二十年

【MIT博士论文】以语言为中心的医学影像理解

俄罗斯“沙希德”/“天竺葵”攻击无人机

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

相关论文

BertNet: Harvesting Knowledge Graphs from Pretrained Language Models

Arxiv

0+阅读 · 2022年6月28日

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Arxiv

0+阅读 · 2022年6月28日

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

Arxiv

0+阅读 · 2022年6月25日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

应用斑马鱼模型研究CERKL突变引起视网膜变性的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

垂直磁各向异性半金属铁磁电极到半导体自旋注入的研究

国家自然科学基金

0+阅读 · 2014年12月31日

一氧化氮调控植物细胞周期的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

铁电性分子磁体的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员