Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception. However, we found that previously released Arabic BERT models were significantly under-trained. In this technical report, we present JABER, Junior Arabic BERt, our pretrained language model prototype dedicated for Arabic. We conduct an empirical study to systematically evaluate the performance of models across a diverse set of existing Arabic NLU tasks. Experimental results show that JABER achieves the state-of-the-art performances on ALUE, a new benchmark for Arabic Language Understanding Evaluation, as well as on a well-established NER benchmark
翻译:事实证明,在单一语言评价环境中,经过培训的语文特定模式比语言多语种模式更准确,阿拉伯语也不例外。然而,我们发现,先前公布的阿拉伯语BERT模式严重缺乏培训。在本技术报告中,我们介绍了我们为阿拉伯语专门设计的经过培训的语言模式原型JABER、初级阿拉伯语Bert。我们开展了一项实证研究,以系统评估各种现有阿拉伯文国家语言单位任务中模型的绩效。实验结果显示,JABER取得了ALUE的最新业绩,ALUE是阿拉伯语理解评价的新基准,也是既定的NER基准。