This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations. LIIR achieved rank 14/38, 18/47, 24/86, 24/54, and 25/40 in Greek, Turkish, English, Arabic, and Danish languages, respectively.
翻译:本文件介绍我们的系统,题为“SemEval-2020”关于社会媒体多语言进攻性语言识别的第12号任务“LIIR”,我们参加了英语、丹麦语、希腊语、阿拉伯语和土耳其语的次级任务A,我们调整和微调了谷歌AI为英语和非英语分别提供的BERT和多语言贝尔模式,在英语方面,我们使用两种经过微调的BERT模式的组合,对于其他语言,我们建议一种跨语言的扩大方法,以丰富培训数据,我们使用多语言的BERT来获得判决陈述,LIIR分别达到希腊语、土耳其语、英语、阿拉伯语和丹麦语的第14/38、18/47、24/86、24/54和25/40级。