Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text complexity without the benefit of machine learning and natural language processing techniques. Although various research addressed the readability assessment of English text in recent years, there is still room for improvement of the models for other languages. In this paper, we proposed a new model for text complexity assessment for German text based on transfer learning. Our results show that the model outperforms more classical solutions based on linguistic features extraction from input text. The best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.
翻译:从语言学习者到残疾人,对不同对象,从语言学习者到残疾人,可读性评估有各种各样的应用。网络的文本内容制作速度之快,使得无法在没有机器学习和自然语言处理技术的好处的情况下衡量文本的复杂性。尽管近年来各种研究涉及了对英文文本的可读性评估,但其他语言的模型仍有改进的余地。在本文件中,我们提出了基于转移学习的德文文本的文本复杂性评估新模式。我们的结果显示,该模型在从输入文本中提取语言特征的基础上,优于经典的解决方案。最佳模式是以德国地名专家小组预先培训的语言模型为基础,实现了0.483的 " 根中位方错误 " 。