MUL3NLU++:在注重任务的对话中建立多语种、多语种、多语种、多功能、多域数据集,以了解自然语言 (MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue) - 专知论文

会员服务 ·

0

可理解性 · 任务对话系统 · 数据集 · 值域 · NLU ·

2022 年 12 月 20 日

MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue

翻译：MUL3NLU++:在注重任务的对话中建立多语种、多语种、多语种、多功能、多域数据集,以了解自然语言

Nikita Moghe,Evgeniia Razumovskaia,Liane Guillou,Ivan Vulić,Anna Korhonen,Alexandra Birch

from arxiv, Release of Dataset v1

Task-oriented dialogue (TOD) systems have been applied in a range of domains to support human users to achieve specific goals. Systems are typically constructed for a single domain or language and do not generalise well beyond this. Their extension to other languages in particular is restricted by the lack of available training data for many of the world's languages. To support work on Natural Language Understanding (NLU) in TOD across multiple languages and domains simultaneously, we constructed MULTI3NLU++, a multilingual, multi-intent, multi-domain dataset. MULTI3NLU++ extends the English-only NLU++ dataset to include manual translations into a range of high, medium and low resource languages (Spanish, Marathi, Turkish and Amharic), in two domains (banking and hotels). MULTI3NLU++ inherits the multi-intent property of NLU++, where an utterance may be labelled with multiple intents, providing a more realistic representation of a user's goals and aligning with the more complex tasks that commercial systems aim to model. We use MULTI3NLU++ to benchmark state-of-the-art multilingual language models as well as Machine Translation and Question Answering systems for the NLU task of intent detection for TOD systems in the multilingual setting. The results demonstrate the challenging nature of the dataset, particularly in the low-resource language setting.

翻译：以任务为导向的对话(TOD)系统已应用于一系列领域,以支持人类用户实现具体目标。系统通常为单一域或语言而建立,而且没有超出此范围的范围。系统推广到其他语言,特别是由于缺乏世界上许多语言的培训数据而受到限制。为了支持以多种语言和领域同时在TOD进行关于自然语言理解(NLU)的工作,我们同时建造了多语种和多语种+++(Mext3NLU+)、多语种、多语种、多域数据集。MUL3NLU+++(MU3NU++)扩大了仅使用英语的NLU+(NLU+)数据集,将人工翻译成两种高、中、低资源语言(西班牙语、马拉地语、土耳其语和阿姆哈拉语)的多种语言。MUDU3NLU++(NLU+)继承了NLU++(多语种)的多语种属性,其中的语种可能被贴有多重意图,更现实地表述用户的目标,并符合商业系统要制作模型的更复杂的任务。我们用MI3NNLU+++7,将高语言检测系统作为基准,特别是多语言的多语种任务测试系统。

0

相关内容

可理解性

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米多孔铜氧化物多尺度构建与力学性能耦合机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧化锌微米线与银薄膜的表面等离子体Fabry-Perot微腔研究

国家自然科学基金

0+阅读 · 2013年12月31日

无线传感器网络异常检测及异常数据重构关键技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于组合范畴语法的汉语深层句法分析

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白激酶LIMK1活性在小鼠卵母细胞染色体分离过程中的作用和分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

介孔二氧化硅/石墨烯三明治层状材料与贵金属纳米簇构建多功能免疫传感器

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂爬行酶1抑制HBV复制的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Adaptive Test Generation Using a Large Language Model

Arxiv

0+阅读 · 2023年2月20日

MultiViz: Towards Visualizing and Understanding Multimodal Models

Arxiv

0+阅读 · 2023年2月20日

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Arxiv

0+阅读 · 2023年2月18日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】稳定电影度量：面向专业视频生成的结构化分类与评测体系

战场AI决策支持系统

【博士论文】面向排序与扩散模型的安全、高效与鲁棒强化学习

面向 AI 生成图像的安全与鲁棒水印：全面综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Adaptive Test Generation Using a Large Language Model

Arxiv

0+阅读 · 2023年2月20日

MultiViz: Towards Visualizing and Understanding Multimodal Models

Arxiv

0+阅读 · 2023年2月20日

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Arxiv

0+阅读 · 2023年2月18日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

相关基金

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米多孔铜氧化物多尺度构建与力学性能耦合机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧化锌微米线与银薄膜的表面等离子体Fabry-Perot微腔研究

国家自然科学基金

0+阅读 · 2013年12月31日

无线传感器网络异常检测及异常数据重构关键技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于组合范畴语法的汉语深层句法分析

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白激酶LIMK1活性在小鼠卵母细胞染色体分离过程中的作用和分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

介孔二氧化硅/石墨烯三明治层状材料与贵金属纳米簇构建多功能免疫传感器

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂爬行酶1抑制HBV复制的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员