The lack of a commonly used benchmark data set (collection) such as (Super-)GLUE (Wang et al., 2018, 2019) for the evaluation of non-English pre-trained language models is a severe shortcoming of current English-centric NLP-research. It concentrates a large part of the research on English, neglecting the uncertainty when transferring conclusions found for the English language to other languages. We evaluate the performance of the German and multilingual BERT-based models currently available via the huggingface transformers library on the four tasks of the GermEval17 workshop. We compare them to pre-BERT architectures (Wojatzki et al., 2017; Schmitt et al., 2018; Attia et al., 2018) as well as to an ELMo-based architecture (Biesialska et al., 2020) and a BERT-based approach (Guhr et al., 2020). The observed improvements are put in relation to those for similar tasks and similar models (pre-BERT vs. BERT-based) for the English language in order to draw tentative conclusions about whether the observed improvements are transferable to German or potentially other related languages.
翻译:缺乏通用基准数据集(收集),例如(Super-)GLUE(Wang等人,2018年,2019年),无法用于评价非英语预培训语言模型(Wang等人,2019年),这是目前以英语为中心的NLP研究的一大缺陷,它集中了大部分关于英语的研究,在将英语的结论转移到其他语言时忽略了不确定性;我们评估了目前通过拥抱式变压器图书馆提供的德国和多语言的BERT模型在GermEval17研讨会四项任务方面的绩效;我们将其与BERT前结构(Wojatzki等人,2017年;Schmitt等人,2018年;Attia等人,2018年)以及基于ELMO的架构(Biesalska等人,2020年)和基于ERT的方法(Guhr等人,2020年)进行了比较,以观察到的改进与类似任务和类似模式(BERT诉BERT公司)在英语方面的情况。