AfriWOZ: 利用跨语言转移能力以生成非洲语言的低资源对话公司 (AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages) - 专知论文

会员服务 ·

0

任务对话系统 · MoDELS · 数据集 · Perplexity · 学成 ·

2022 年 5 月 19 日

AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages

翻译：AfriWOZ: 利用跨语言转移能力以生成非洲语言的低资源对话公司

Tosin Adewumi,Mofetoluwa Adeyemi,Aremu Anuoluwapo,Bukola Peters,Happy Buzaaba,Oyerinde Samuel,Amina Mardiyyah Rufai,Benjamin Ajibade,Tajudeen Gwadabe,Mory Moussou Koulibaly Traore,Tunde Ajayi,Shamsuddeen Muhammad,Ahmed Baruwa,Paul Owoicho,Tolulope Ogunremi,Phylis Ngigi,Orevaoghene Ahia,Ruqayya Nasir,Foteini Liwicki,Marcus Liwicki

from arxiv, 14 pages, 1 figure, 8 tables

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

翻译：生成NLP 对话是一项重要的NLP 任务, 包含许多挑战。对非洲低资源语言而言, 挑战变得更为艰巨。为了建立非洲语言的对话代理机构, 我们为6种非洲语言( 斯瓦希里语、沃洛夫语、豪萨语、尼日利亚皮德金英语、基尼亚卢旺达语 & Yor ⁇ ub\'a ) 提供了第一批高质量的对话数据集。这些数据集由1500个旋转组成, 我们从英语多面多面多面多WOZ数据集中的一部分翻译出来。随后, 我们调查并分析建模的效果, 通过使用最先进的艺术( SoTA) 深单语言的单一语言传授来传授模型。我们通过自由的单面语言模式, 将模型与6种非洲语言( DialoGPT ) 和 BlenderBot 等高语言( BlenderBat) 。我们用简单后2eq 基准比较这些模型。除此之外, 我们使用多数的票和测量器对单面对话进行人文评价, 我们发现深单面单面模型可以理解各种语言的抽象的模型。我们观察类似对话, 以不同程度的6个公交式的公交点的公交的公交点, 。

0

相关内容

任务对话系统

任务对话系统

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

多控制面机翼/外挂系统的跨音速气动伺服弹性研究

国家自然科学基金

0+阅读 · 2015年12月31日

COMET实验CDC软件发展和数据处理

国家自然科学基金

0+阅读 · 2014年12月31日

持续性高铁摄入对放牧绵羊铁吸收和母子代健康影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体的电子结构实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属咔咯配合物的催化反应

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

活性骨支架多尺度三维快速精准组装机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于含水孔隙弹性理论的声带振动和发声机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Medical Information Extraction Workbench to Process German Clinical Text

A Medical Information Extraction Workbench to Process German Clinical Text

Arxiv

0+阅读 · 2022年7月8日

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Arxiv

0+阅读 · 2022年7月7日

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Arxiv

0+阅读 · 2022年7月7日

AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations

Arxiv

0+阅读 · 2022年7月6日

Large-Scale Hate Speech Detection with Cross-Domain Transfer

Arxiv

0+阅读 · 2022年7月5日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Medical Information Extraction Workbench to Process German Clinical Text

A Medical Information Extraction Workbench to Process German Clinical Text

Arxiv

0+阅读 · 2022年7月8日

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Arxiv

0+阅读 · 2022年7月7日

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Arxiv

0+阅读 · 2022年7月7日

AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations

Arxiv

0+阅读 · 2022年7月6日

Large-Scale Hate Speech Detection with Cross-Domain Transfer

Arxiv

0+阅读 · 2022年7月5日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

相关基金

多控制面机翼/外挂系统的跨音速气动伺服弹性研究

国家自然科学基金

0+阅读 · 2015年12月31日

COMET实验CDC软件发展和数据处理

国家自然科学基金

0+阅读 · 2014年12月31日

持续性高铁摄入对放牧绵羊铁吸收和母子代健康影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体的电子结构实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属咔咯配合物的催化反应

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

活性骨支架多尺度三维快速精准组装机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于含水孔隙弹性理论的声带振动和发声机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员