Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.
翻译:语言能力评估对教育至关重要,它有助于根据学习者需求提供针对性教学。本文研究了利用大型语言模型(LLMs)自动将德语文本依据《欧洲语言共同参考框架》(CEFR)划分为不同能力等级的方法。为支持稳健的训练与评估,我们通过整合多个现有CEFR标注语料库与合成数据,构建了一个多样化的数据集。随后,我们评估了提示工程策略、对LLaMA-3-8B-Instruct模型的微调方法,以及一种利用LLM内部神经状态进行分类的探测式方法。实验结果表明,相较于现有方法,我们的方法在性能上取得了持续提升,凸显了LLMs在实现可靠且可扩展的CEFR分类方面的潜力。