通过字元发音预测改进中文拼写检查:适应性和颗粒效应 (Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity)

Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts. As most of these spelling errors are caused by phonetic similarity, effectively modeling the pronunciation of Chinese characters is a key factor for CSC. In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction (CPP) to improve CSC, and, for the first time, systematically discuss the adaptivity and granularity of this auxiliary task. We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task, with a novel adaptive weighting scheme to balance the two tasks. In addition, we design a delicate iterative correction strategy for further improvements during inference. Empirical evaluation shows that SCOPE achieves new state-of-the-art on three CSC benchmarks, demonstrating the effectiveness and superiority of the auxiliary CPP task. Comprehensive ablation studies further verify the positive effects of adaptivity and granularity of the task. Code and data used in this paper are publicly available at https://github.com/jiahaozhenbang/SCOPE.

翻译：中国拼写检查(CSC)是中国国家语言方案的一项基本任务,它检测和纠正中文文本中的拼写错误。由于这些拼写错误大多由音效相似性造成,因此,中国语言系统的一个关键因素是中文字符发音的有效模型。在本文件中,我们考虑引入中国发音预测的辅助任务,以改进CSC,并首次系统地讨论这一辅助任务的适应性和颗粒性。我们建议SCOPE在共同编码器上建立两个平行解码器,一个用于主要CSC任务,另一个用于精细的辅助CPP任务,并采用新的调整加权法,以平衡这两项任务。此外,我们设计了一个微妙的迭代校正战略,以便在推论期间进一步改进。Empirical评估显示,SCOPE在三项C基准上达到了新的状态,显示了辅助CPP任务的有效性和优越性。全面化研究进一步核实了该任务适应性和颗粒性的积极影响。代码和本文中使用的数据可在http://SCOBHA. 和数据中公开查阅。