Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0.435.
翻译:可靠的自动可读性评估方法有可能对各个领域产生影响,从机器翻译到自学不等。最近,德国语的大型语言模型(如GBERT和GPT-2-Wechsel)已经出现,可以开发深学习方法,保证进一步改进自动可读性评估。在这一贡献中,我们研究了精细调整的GBERT和GPT-2-Wechsel模型的组合能力,以可靠地预测德国句子的可读性。我们将这些模型与语言特征结合起来,并调查预测性能对共同体大小和构成的依赖性。GBERT和GPT-2-Wechsel的混合组合比相同规模的集合(只有GBERT或GPT-2-Wechsel的模型)表现更好。我们的模型在GermEval 2022关于德国句数据文本复杂性评估的共同任务中进行了评估。关于模拟外数据,我们的最佳组合实现了0.435的根正方差错误。