To automatically correct handwritten assignments, the traditional approach is to use an OCR model to recognize characters and compare them to answers. The OCR model easily gets confused on recognizing handwritten Chinese characters, and the textual information of the answers is missing during the model inference. However, teachers always have these answers in mind to review and correct assignments. In this paper, we focus on the Chinese cloze tests correction and propose a multimodal approach (named AiM). The encoded representations of answers interact with the visual information of students' handwriting. Instead of predicting 'right' or 'wrong', we perform the sequence labeling on the answer text to infer which answer character differs from the handwritten content in a fine-grained way. We take samples of OCR datasets as the positive samples for this task, and develop a negative sample augmentation method to scale up the training data. Experimental results show that AiM outperforms OCR-based methods by a large margin. Extensive studies demonstrate the effectiveness of our multimodal approach.
翻译:为了自动纠正手写任务,传统的做法是使用OCR模型来识别字符并将其与答案进行比较。 OCR模型很容易在识别手写中文字符时被混淆, 而在模型推理过程中缺少答案的文本信息。 但是, 教师总是会想到这些答案来审查和纠正任务。 在本文中, 我们侧重于中国的凝块测试校正, 并提出一种多式方法( 名为 AiM ) 。 答案的编码表达方式与学生笔迹的视觉信息相互作用。 我们不是预测“ 右” 或“ 错”, 而是在回答文本上进行顺序标记, 以精确的方式推断答案字符与手写内容不同。 我们把OCR数据集样本作为这项任务的正面样本, 并开发一种负样增强方法, 以扩大培训数据。 实验结果显示, AiM 大大超越了基于 OCR 的方法。 广泛的研究显示我们多式方法的有效性 。