This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence. We conduct experiments with two mainstream CGEC models, i.e., the sequence-to-sequence (Seq2Seq) model and the sequence-to-edit (Seq2Edit) model, both enhanced with large pretrained language models (PLMs), achieving competitive benchmark performance on previous and our datasets. We also discuss CGEC evaluation methodologies, including the effect of multiple references and using a char-based metric. Our annotation guidelines, data, and code are available at \url{https://github.com/HillZhang1999/MuCGEC}.
翻译:本文介绍中华语言校正多参考多源评价数据集MOCGEC, 该数据集由三个不同的中文第二语言学习者来源收集的7 063个句子组成,每个句子均由3名注解者更正,由专家仔细审阅,每句引用2.3次,我们用两个主流的语法校正模型进行实验,即顺序至序列模型(Seq2Seq)和顺序至编辑模型(Seq2Edid),两者都用大型预先培训语言模型强化,在以往和我们的数据集上实现竞争性基准性能,我们还讨论中华语校评委的评价方法,包括多处参考的效果,并使用以资为基础的指标,我们的注解指南、数据和代码可在以下以下网站查阅:<https://github.com/HillZhang1999/MUCGEC}。