GottBERT: 纯德文语言模型 (GottBERT: a pure German Language Model)

Lately, pre-trained language models advanced the field of natural language processing (NLP). The introduction of Bidirectional Encoders for Transformers (BERT) and its optimized version RoBERTa have had significant impact and increased the relevance of pre-trained models. First, research in this field mainly started on English data followed by models trained with multilingual text corpora. However, current research shows that multilingual models are inferior to monolingual models. Currently, no German single language RoBERTa model is yet published, which we introduce in this work (GottBERT). The German portion of the OSCAR data set was used as text corpus. In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones. GottBERT was pre-trained related to the original RoBERTa model using fairseq. All downstream tasks were trained using hyperparameter presets taken from the benchmark of German BERT. The experiments were setup utilizing FARM. Performance was measured by the $F_{1}$ score. GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture. Even without extensive hyper-parameter optimization, in all NER and one text classification task, GottBERT already outperformed all other tested German and multilingual models. In order to support the German NLP field, we publish GottBERT under the AGPLv3 license.

翻译：最近,经过培训的语文模型推进了自然语言处理领域(NLP)。为变换者引入双向编码器(BERT)及其优化版RoBERTA引入了双向编码器(BERT)已经产生了重大影响,提高了预培训模型的相关性。首先,这一领域的研究主要从英语数据开始,随后是经过多语言文本公司培训的模型。然而,目前的研究表明,多种语言模型低于单一语言模型。目前,还没有公布任何德国单一语言的RoBERTA模型,我们在此工作中采用(GottBERTER)模型。OSCAR数据集的德国部分被用作文本库。在一项评估中,我们比较了其在2003年Conll(NER)和2014年GermEval两个命名实体识别(NER)任务中的绩效,以及GARDA的文本分类任务(GermEval 2018(f)和corrass)和现有的两种语言BERTER模型的文本。GETERT在使用公平(G GOTER)的原始模型中提前培训。所有下游任务都用超标准前,在德国的TRART1 和RRRRRR1 的模型中进行了测试。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日