拨打Larisa Ivanovna:多语种NLU模型 (Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models) - 专知论文

会员服务 ·

0

NLU · 可理解性 · MoDELS · Performer · Better ·

2021 年 11 月 20 日

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

翻译：拨打Larisa Ivanovna:多语种NLU模型

Alexey Birshert,Ekaterina Artemova

from arxiv, accepted to AIST 2021

Practical needs of developing task-oriented dialogue assistants require the ability to understand many languages. Novel benchmarks for multilingual natural language understanding (NLU) include monolingual sentences in several languages, annotated with intents and slots. In such setup models for cross-lingual transfer show remarkable performance in joint intent recognition and slot filling. However, existing benchmarks lack of code-switched utterances, which are difficult to gather and label due to complexity in the grammatical structure. The evaluation of NLU models seems biased and limited, since code-switching is being left out of scope. Our work adopts recognized methods to generate plausible and naturally-sounding code-switched utterances and uses them to create a synthetic code-switched test set. Based on experiments, we report that the state-of-the-art NLU models are unable to handle code-switching. At worst, the performance, evaluated by semantic accuracy, drops as low as 15\% from 80\% across languages. Further we show, that pre-training on synthetic code-mixed data helps to maintain performance on the proposed test set at a comparable level with monolingual data. Finally, we analyze different language pairs and show that the closer the languages are, the better the NLU model handles their alternation. This is in line with the common understanding of how multilingual models conduct transferring between languages

翻译：开发面向任务的对话助理的实际需要要求能够理解多种语言。多语言自然语言理解(NLU)的新基准包括几种语言的单语句,带有意向和空档注注解。在跨语言传输的设置模式中,在共同意向识别和空档填充方面表现显著。然而,由于语法结构复杂,很难收集和标注的代码偏差语,现有基准缺乏代码偏差的发音。对NLU模式的评价似乎有偏差和局限性,因为代码切换被排除了范围。我们的工作采用公认的方法,产生合理和自然声音的代码转换语句,并利用这些方法创建合成代码转换测试集。根据实验,我们报告说,由于语法结构的复杂性,目前无法收集和标出代码转换的语句。最差的是,根据语义准确性评估的性表现从80 ⁇ 下降为15 ⁇ 。我们进一步显示,合成代码转换数据前的培训有助于保持对代码转换码转换功能的正确性,最后,我们用最接近的版本的文本将数据转换为更接近的版本。

0

相关内容

NLU

中文领域命名实体识别综述

专知会员服务

71+阅读 · 2021年8月20日

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

【北京大学冯岩松】基于知识的自然语言问答

【北京大学冯岩松】基于知识的自然语言问答

专知会员服务

45+阅读 · 2020年11月15日

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

专知会员服务

76+阅读 · 2020年8月18日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

论文浅尝 | 基于知识库的自然语言理解 04#

论文浅尝 | 基于知识库的自然语言理解 04#

开放知识图谱

14+阅读 · 2019年3月14日

论文浅尝 | 基于知识库的自然语言理解 01#

论文浅尝 | 基于知识库的自然语言理解 01#

开放知识图谱

15+阅读 · 2019年2月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器翻译 | Bleu：此蓝;非彼蓝

机器翻译 | Bleu：此蓝;非彼蓝

黑龙江大学自然语言处理实验室

4+阅读 · 2018年3月14日

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

专知

6+阅读 · 2018年1月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

Arxiv

0+阅读 · 2022年1月24日

An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

Arxiv

0+阅读 · 2022年1月23日

Adaptive Sparse Transformer for Multilingual Translation

Arxiv

0+阅读 · 2022年1月22日

XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge

Arxiv

7+阅读 · 2021年12月26日

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Arxiv

3+阅读 · 2021年6月11日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Arxiv

4+阅读 · 2018年6月12日

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Arxiv

3+阅读 · 2018年5月30日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

相关VIP内容

中文领域命名实体识别综述

专知会员服务

71+阅读 · 2021年8月20日

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

【北京大学冯岩松】基于知识的自然语言问答

【北京大学冯岩松】基于知识的自然语言问答

专知会员服务

45+阅读 · 2020年11月15日

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

专知会员服务

76+阅读 · 2020年8月18日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

论文浅尝 | 基于知识库的自然语言理解 04#

论文浅尝 | 基于知识库的自然语言理解 04#

开放知识图谱

14+阅读 · 2019年3月14日

论文浅尝 | 基于知识库的自然语言理解 01#

论文浅尝 | 基于知识库的自然语言理解 01#

开放知识图谱

15+阅读 · 2019年2月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器翻译 | Bleu：此蓝;非彼蓝

机器翻译 | Bleu：此蓝;非彼蓝

黑龙江大学自然语言处理实验室

4+阅读 · 2018年3月14日

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

专知

6+阅读 · 2018年1月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

Arxiv

0+阅读 · 2022年1月24日

An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

Arxiv

0+阅读 · 2022年1月23日

Adaptive Sparse Transformer for Multilingual Translation

Arxiv

0+阅读 · 2022年1月22日

XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge

Arxiv

7+阅读 · 2021年12月26日

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Arxiv

3+阅读 · 2021年6月11日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Arxiv

4+阅读 · 2018年6月12日

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Arxiv

3+阅读 · 2018年5月30日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员