Chinese Spell Checking (CSC) task aims to detect and correct Chinese spelling errors. Recently, related researches focus on introducing character similarity from confusion set to enhance the CSC models, ignoring the context of characters that contain richer information. To make better use of contextual information, we propose a simple yet effective Curriculum Learning (CL) framework for the CSC task. With the help of our model-agnostic CL framework, existing CSC models will be trained from easy to difficult as humans learn Chinese characters and achieve further performance improvements. Extensive experiments and detailed analyses on widely used SIGHAN datasets show that our method outperforms previous state-of-the-art methods. More instructively, our study empirically suggests that contextual similarity is more valuable than character similarity for the CSC task.
翻译:中文拼写检查( CSC) 任务旨在检测和纠正中文拼写错误。 最近, 相关研究侧重于引入与强化 CSC 模型的混乱状态相似的性格, 忽略了包含更丰富信息的字符的背景。 为了更好地利用背景信息, 我们为 CSC 任务提出了一个简单而有效的课程学习框架。 在我们的模型- 不可知的 CL 框架的帮助下, 现有的 CSC 模型将随着人类学习中国字符并实现进一步的性能改进而从容易到困难的方面接受培训。 对广泛使用的 SIGHAN 数据集的广泛实验和详细分析表明, 我们的方法比以前最先进的方法要好。 更有启发性地说, 我们的研究从经验上表明, 环境相似性比 CSC 任务的特点相似性更有价值。</s>