Chinese spelling check is a task to detect and correct spelling mistakes in Chinese text. Existing research aims to enhance the text representation and use multi-source information to improve the detection and correction capabilities of models, but does not pay too much attention to improving their ability to distinguish between confusable words. Contrastive learning, whose aim is to minimize the distance in representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel framework for Chinese spelling checking, which consists of three modules: language representation, spelling check and reverse contrastive learning. Specifically, we propose a reverse contrastive learning strategy, which explicitly forces the model to minimize the agreement between the similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our framework is model-agnostic and could be combined with existing Chinese spelling check models to yield state-of-the-art performance.
翻译:中国拼写检查是发现和纠正中文文字拼写错误的一项任务。现有研究的目的是加强文字表达方式,利用多源信息提高模型的探测和校正能力,但并没有过多地注意提高模型区分互不相容的字眼的能力。对比学习的目的是尽量缩小相似的样板配对之间的代表空间距离,最近已成为自然语言处理的主导技术。在对比学习的启发下,我们提出了一个中国拼写检查新框架,由三个模块组成:语言表达方式、拼写检查和反向对比学习。具体地说,我们提出了一个反向对比学习战略,明确要求模型最大限度地减少类似例子之间的协议,即语音和视觉兼容字符。实验结果表明,我们的框架是模范的,可以与现有的中国拼写检查模型相结合,产生最先进的性能。