CALCS 2021 共享任务:代码转换数据的机器翻译 (CALCS 2021 Shared Task: Machine Translation for Code-Switched Data)

To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions. We share insights and challenges in curating the "into" code-switching language evaluation data. Further, we provide baselines for all language pairs in the shared task. The leaderboard for the shared task comprises 12 individual system submissions corresponding to 5 different teams. The best performance achieved is 12.67% BLEU score for English to Hinglish and 25.72% BLEU score for MSAEA to English.

翻译：迄今为止,代码转换文学方面的工作主要集中在语言识别、 POS、 NER 和综合分析上。在本文中,我们处理代码转换社会媒体数据的机器翻译问题。我们创建了一个共同的任务。我们为参与提供了两种模式:监管和不受监督。在监督的环境下,参与者面临的挑战是将英语翻译成单一方向的印地语英语(英英语)。在未受监督的环境下,我们提供以下两种语言:英语和西班牙语英语(英语-斯潘吉西语),以及两个方向的英语和现代阿拉伯语标准(英美语-埃及语)。我们分享了在“引入”代码转换语言评价数据方面的见解和挑战。此外,我们为共同任务中的所有语言配对提供了基准。共同任务的领导板包括与5个不同团队相对的12个个人系统提交材料。取得的最佳成绩是英语和英语BLEU评分的12.67%,以及美语和英语评分的BLEU的25.72%。

相关内容

Machine Translation

关注 0

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日