Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.
翻译:控制性文字简化是语言学习和教学的关键辅助技术,阻碍其进步的主要因素之一是缺乏基于语言能力说明的判刑困难程度说明材料,为解决这一问题,我们创建了CEFR-P(CEFR-SP)判刑概况(CEFR-SP)汇编,其中载有17k个英文句子,根据《欧洲英语教育专业人员语言参考框架共同框架》附加了17k个英文句子,此外,我们提议了一个判决评估模式,处理不平衡的级别分布,因为最基本和最熟练的刑期自然很少。在本研究的实验中,我们的方法在级别评估中达到了84.5%的宏观F1分,从而超过了可读性评估所采用的强基线。