In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. Our trained models will be made publicly available to the research community.
翻译:在这项工作中,我们介绍了导致创建我们的BERT和ELECTRA德国语言模型的实验,GBERT和GELECTRA。通过改变投入培训数据、模型大小和全字遮掩(WWM)的存在,我们得以在一系列文件分类和实体识别(NER)任务中实现SoTA在基础和大号模型中的业绩。我们在培训这些模型时采用了评价驱动方法,我们的结果表明,既增加了更多的数据,又利用WWWM改进了模型的性能。通过比照现有的德国模型,我们证明这些模型是迄今为止最好的德国模型。我们经过培训的模型将公开提供给研究界。