MDIA:以46种语言制作多语言对话的基准 (MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages) - 专知论文

会员服务 ·

0

任务对话系统 · 讲稿 · 多样性 · Performer · 模型性能 ·

2022 年 8 月 27 日

MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

翻译：MDIA:以46种语言制作多语言对话的基准

Qingyu Zhang,Xiaoyu Shen,Ernie Chang,Jidong Ge,Pengke Chen

from arxiv, The dataset and processing scripts are available in https://github.com/DoctorDream/mDIA

Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.

翻译：由于缺乏低资源语言的组合,目前关于对话的生成工作主要集中在英语上,本文介绍的是MDIA,这是在低资源语言和高资源语言之间开展对话的第一个大型多语文基准,涵盖19个语言家庭用46种语言进行的实际对话。我们介绍的是通过微调多语言、非对话重点的预先培训模式MT5以及以英语为中心的、以对话为重点的预先培训的聊天器DialoGPT所取得的基线结果。结果显示,基于MT5的模型在sacrebleU和BertScore多样性方面效果更好,但尽管在几率和零率的假设中都发现了有希望的结果,但英语和其他语言的一代质量之间仍然有很大差距。我们希望,发布MDIA能够鼓励更多关于多语言对话的生成工作,以促进语言多样性。

0

相关内容

任务对话系统

任务对话系统

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模汉语历时语料库建设及词汇语义变迁研究

国家自然科学基金

1+阅读 · 2014年12月31日

聚电解质纳米纤维可控负载零价铁/二氧化钛复合纳米颗粒消除典型水污染物的研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于稀疏表示的电磁兼容测试信号处理技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

骨髓间充质干细胞抗心肌衰老作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

CAPE抑制EMT信号途径逆转大肠癌耐药性的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

山羊皮质脊髓束投射通路缺血性损伤动物模型建立以及MPA1B在BMSCs脊髓内迁移中的轴突导向作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

Generative Multi-hop Retrieval

Arxiv

0+阅读 · 2022年10月16日

CDConv: A Benchmark for Contradiction Detection in Chinese Conversations

Arxiv

0+阅读 · 2022年10月16日

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Arxiv

0+阅读 · 2022年10月14日

Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets

Arxiv

0+阅读 · 2022年10月13日

Explanations from Large Language Models Make Small Reasoners Better

Arxiv

0+阅读 · 2022年10月13日

Who Wrote this? How Smart Replies Impact Language and Agency in the Workplace

Arxiv

0+阅读 · 2022年10月7日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美军“泰坦（TITAN）地面站目标系统”：是颠覆还是一场可预见的军事进步？

美空军指挥参谋学院 · 联合空中作战规划课程介绍（2025年） | 22页

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

北约第十七届（2025年）网络冲突国际会议论文集 | 272页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Generative Multi-hop Retrieval

Arxiv

0+阅读 · 2022年10月16日

CDConv: A Benchmark for Contradiction Detection in Chinese Conversations

Arxiv

0+阅读 · 2022年10月16日

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Arxiv

0+阅读 · 2022年10月14日

Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets

Arxiv

0+阅读 · 2022年10月13日

Explanations from Large Language Models Make Small Reasoners Better

Arxiv

0+阅读 · 2022年10月13日

Who Wrote this? How Smart Replies Impact Language and Agency in the Workplace

Arxiv

0+阅读 · 2022年10月7日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模汉语历时语料库建设及词汇语义变迁研究

国家自然科学基金

1+阅读 · 2014年12月31日

聚电解质纳米纤维可控负载零价铁/二氧化钛复合纳米颗粒消除典型水污染物的研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于稀疏表示的电磁兼容测试信号处理技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

骨髓间充质干细胞抗心肌衰老作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

CAPE抑制EMT信号途径逆转大肠癌耐药性的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

山羊皮质脊髓束投射通路缺血性损伤动物模型建立以及MPA1B在BMSCs脊髓内迁移中的轴突导向作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员