The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses on English; in contrast, we present here the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages. In monolingual settings, our analysis shows that monolingual BERT models generally dominate, with a few exceptions such as the dependency parsing task, where they are not competitive with ELMo models trained on large corpora. In cross-lingual settings, BERT models trained on only a few languages mostly do best, closely followed by massively multilingual BERT models.
翻译:目前,在自然语言处理中,深层神经网络的主导地位基于环境嵌入,如ELMO、BERT和BERT衍生物。大多数现有工作都以英语为重点;相比之下,我们在此介绍两个ELMO和若干单一语言和多语言BERT模式的第一次多语种经验比较,使用9种语言的14项任务。在单一语言环境中,我们的分析表明,单语BERT模式通常占主导地位,但有少数例外,如依赖性区分任务,与在大型公司方面受过培训的ELMO模式没有竞争力。在跨语言环境中,只对少数语言进行了最优秀的培训,紧随其后是大规模多语言的BERT模式。