代码转换数据及其对 POS 标记的影响 (Lexical Normalization for Code-switched Data and its Effect on POS-tagging) - 专知论文

会员服务 ·

0

规范化的 · Performer · 词性标注 · MoDELS · 层 ·

2021 年 1 月 31 日

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

翻译：代码转换数据及其对 POS 标记的影响

Rob van der Goot,Özlem Çetinoğlu

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media. Yet, using multiple languages in one utterance, also called code-switching (CS), is frequently overlooked by these normalization systems, despite its common use in social media. In this paper, we propose three normalization models specifically designed to handle code-switched data which we evaluate for two language pairs: Indonesian-English (Id-En) and Turkish-German (Tr-De). For the latter, we introduce novel normalization layers and their corresponding language ID and POS tags for the dataset, and evaluate the downstream effect of normalization on POS tagging. Results show that our CS-tailored normalization models outperform Id-En state of the art and Tr-De monolingual models, and lead to 5.4% relative performance increase for POS tagging as compared to unnormalized input.

翻译：将非卡门数据转换成标准语言的词汇正常化,这显示社会媒体上许多自然语言处理任务的业绩有所改善。然而,尽管这些标准化系统在社交媒体中普遍使用,但这些正常化系统却经常忽略了使用多种语言,也称为代码转换(CS ) 。在本文中,我们提出了三个专门设计用于处理代码转换数据的正常化模式,我们评估了两种语言对:印度尼西亚语-英语(Id-En)和土耳其语-德语(Tr-De),对于后者,我们为数据集引入了新型的正常化层及其相应的语言ID和POS标记,并评估了标准化对POS标记的下游效应。结果显示,我们的CS定制的正常化模式超越了艺术和Tr-De单语模式的Id-En状态,并导致POS标记相对于非常规输入的相对性能提高5.4%。

0

相关内容

规范化的

最新《神经架构搜索NAS》教程，33页pdf

最新《神经架构搜索NAS》教程，33页pdf

专知会员服务

27+阅读 · 2020年12月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【硬核课】最新《图卷积神经网络GCN》2020概述，76页ppt，NTU-Xavier Bresson，纽约大学深度学习课程

【硬核课】最新《图卷积神经网络GCN》2020概述，76页ppt，NTU-Xavier Bresson，纽约大学深度学习课程

专知会员服务

159+阅读 · 2020年5月1日

【PUC-牛津-ICLR2020】图神经网络的逻辑表达性，The Logical Expressiveness of GNN

专知会员服务

29+阅读 · 2020年3月15日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NAACL 2019 workshop】词汇和计算语义学联合会议 The 8th Joint Conference on Lexical and Computational Semantics ，犹他大学（The University of Utah）| Ellen Riloff，纽约大学| Sam Bowman

【NAACL 2019 workshop】词汇和计算语义学联合会议 The 8th Joint Conference on Lexical and Computational Semantics ，犹他大学（The University of Utah）| Ellen Riloff，纽约大学| Sam Bowman

专知会员服务

6+阅读 · 2019年12月5日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

已删除

将门创投

5+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Arxiv

0+阅读 · 2021年3月26日

Are Multilingual Models Effective in Code-Switching?

Arxiv

0+阅读 · 2021年3月24日

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

Arxiv

0+阅读 · 2021年3月24日

M6: A Chinese Multimodal Pretrainer

Arxiv

8+阅读 · 2021年3月2日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Arxiv

3+阅读 · 2018年5月30日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Single-Perspective Warps in Natural Image Stitching

Arxiv

4+阅读 · 2018年2月13日

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

Arxiv

3+阅读 · 2018年1月23日

VIP会员

文章信息

相关主题

相关VIP内容

最新《神经架构搜索NAS》教程，33页pdf

最新《神经架构搜索NAS》教程，33页pdf

专知会员服务

27+阅读 · 2020年12月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【硬核课】最新《图卷积神经网络GCN》2020概述，76页ppt，NTU-Xavier Bresson，纽约大学深度学习课程

【硬核课】最新《图卷积神经网络GCN》2020概述，76页ppt，NTU-Xavier Bresson，纽约大学深度学习课程

专知会员服务

159+阅读 · 2020年5月1日

【PUC-牛津-ICLR2020】图神经网络的逻辑表达性，The Logical Expressiveness of GNN

专知会员服务

29+阅读 · 2020年3月15日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NAACL 2019 workshop】词汇和计算语义学联合会议 The 8th Joint Conference on Lexical and Computational Semantics ，犹他大学（The University of Utah）| Ellen Riloff，纽约大学| Sam Bowman

【NAACL 2019 workshop】词汇和计算语义学联合会议 The 8th Joint Conference on Lexical and Computational Semantics ，犹他大学（The University of Utah）| Ellen Riloff，纽约大学| Sam Bowman

专知会员服务

6+阅读 · 2019年12月5日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

已删除

将门创投

5+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Arxiv

0+阅读 · 2021年3月26日

Are Multilingual Models Effective in Code-Switching?

Arxiv

0+阅读 · 2021年3月24日

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

Arxiv

0+阅读 · 2021年3月24日

M6: A Chinese Multimodal Pretrainer

Arxiv

8+阅读 · 2021年3月2日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Arxiv

3+阅读 · 2018年5月30日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Single-Perspective Warps in Natural Image Stitching

Arxiv

4+阅读 · 2018年2月13日

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

Arxiv

3+阅读 · 2018年1月23日

微信扫码咨询专知VIP会员