For readability assessment, traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) for readability assessment. Specially, we present a new difficulty knowledge guided semi-supervised method to extract topic features to complement the traditional linguistic features. From the linguistic features, we employ projection filtering to extract orthogonal features to supplement BERT representations. Furthermore, we design a new length-balanced loss to handle the greatly varying length distribution of data. Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks, and also achieves the near-perfect accuracy of 99\% on one English dataset. Moreover, our proposed model obtains comparable results with human experts in consistency test.
翻译:在可读性评估中,传统方法主要采用具有数百种语言特征的机器学习分类方法。虽然深层次学习模式已成为几乎所有国家学习计划任务的主要方法,但用于可读性评估的探索较少。在本文件中,我们提出了基于BERT的模型,具有地貌预测和时间平衡损失(BERT-FP-LBL),用于可读性评估。特别是,我们提出了一个新的困难知识指导半监督方法,用于提取专题特征,以补充传统语言特征。从语言特征中,我们采用投射过滤过滤法提取正方形特征,以补充BERT的表述。此外,我们设计了一个新的长度平衡损失,以处理数据差异很大的分布。我们的模型在两个英国基准数据集和中国教科书的一个数据集上取得了最先进的性能,并在一个英文数据集上实现了99 ⁇ 的近乎完美准确性。此外,我们提议的模型在一致性测试中获得了与人类专家的类似结果。