In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem. We compare our method with other methods and publicly available systems. The proposed method outperforms all of the contemporary methods in terms of recall, precision, and f1-score. A demo version is publicly available.
翻译:在本文中,我们提出了越南拼写更正问题的等级变换模型,由多个变换器编码器组成,利用字符级和字级识别错误和校正。此外,为了便利今后越南拼写更正任务方面的工作,我们提出了从实际版本中收集的关于问题的切合实际的数据集。我们比较了我们的方法和其他方法和公开可用的系统。在回溯、精确和f1-score方面,拟议的方法优于当代所有方法。演示版本是公开的。