We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations. Recently, per-word classification of correction edits has proven an efficient, parallelizable alternative to current encoder-decoder GEC systems. We show that word replacement edits may be suboptimal and lead to explosion of rules for spelling, diacritization and errors in morphologically rich languages, and propose a method for generating character transformations from GEC corpus. Finally, we train character transformation models for Czech, German and Russian, reaching solid results and dramatic speedup compared to autoregressive systems. The source code is released at https://github.com/ufal/wnut2021_character_transformations_gec.
翻译:我们建议一种基于字符的非偏重性 GEC 方法, 并自动生成字符转换。 最近, 校正编辑的逐字分类证明是当前编码器- 解码器 GEC 系统的一种高效的、 平行的替代方法。 我们显示, 替换字词的编辑可能不理想, 并导致在形态丰富语言中出现拼写、 分化和错误规则爆炸, 并提议一种从 GEC 中生成字符转换的方法 。 最后, 我们为捷克、 德文和俄文培训字符转换模型, 与自动递增系统相比, 取得扎实的结果和戏剧性加速。 源代码在 https:// github.com/uffal/wnut2021_character_ transformations_gec 上发布 。