Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature. Developing accurate grammar scoring models further requires extensive expert annotation, making large-scale data creation impractical. To address these limitations, we propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels. During training, we employ LLM-generated predictions on unlabeled data by using grammar competency rubric-based prompts. These predictions, treated as pseudo labels, are utilized to train a transformer-based model through a novel training framework designed to handle label noise effectively. We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy. Finally, a qualitative analysis of error intensity and score prediction confirms the robustness and interpretability of our approach. Experimental results demonstrate the efficacy of our approach in estimating grammar competency scores with high accuracy, paving the way for scalable, low-resource grammar assessment systems.
翻译:语法能力评估对于书面语和口语的语言熟练度测评至关重要;然而,口语模态因其自发性、非结构性和不流畅性而带来额外挑战。开发精确的语法评分模型进一步需要大量专家标注,使得大规模数据构建难以实现。为应对这些局限,我们提出一种零样本语法能力评估框架,该框架利用未标注数据和大语言模型(LLMs),无需依赖人工标注。在训练过程中,我们通过基于语法能力评分标准的提示词,使用LLMs对未标注数据生成预测结果。这些被视为伪标签的预测结果,通过一种专门设计用于有效处理标签噪声的新型训练框架,用于训练基于Transformer的模型。我们发现,用于生成伪标签的LLM选择对模型性能具有关键影响,且训练过程中干净样本与噪声样本的比例会显著影响模型的稳定性和准确性。最后,通过对错误强度和分数预测的定性分析,验证了我们方法的鲁棒性和可解释性。实验结果表明,我们的方法能够以高精度评估语法能力分数,为可扩展、低资源的语法评估系统开辟了道路。