To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions. We share insights and challenges in curating the "into" code-switching language evaluation data. Further, we provide baselines for all language pairs in the shared task. The leaderboard for the shared task comprises 12 individual system submissions corresponding to 5 different teams. The best performance achieved is 12.67% BLEU score for English to Hinglish and 25.72% BLEU score for MSAEA to English.
翻译:迄今为止,代码转换文学方面的工作主要集中在语言识别、 POS、 NER 和综合分析上。在本文中,我们处理代码转换社会媒体数据的机器翻译问题。我们创建了一个共同的任务。我们为参与提供了两种模式:监管和不受监督。在监督的环境下,参与者面临的挑战是将英语翻译成单一方向的印地语英语(英英语)。在未受监督的环境下,我们提供以下两种语言:英语和西班牙语英语(英语-斯潘吉西语),以及两个方向的英语和现代阿拉伯语标准(英美语-埃及语)。我们分享了在“引入”代码转换语言评价数据方面的见解和挑战。此外,我们为共同任务中的所有语言配对提供了基准。共同任务的领导板包括与5个不同团队相对的12个个人系统提交材料。取得的最佳成绩是英语和英语BLEU评分的12.67%,以及美语和英语评分的BLEU的25.72%。