This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.
翻译:本文探讨了对诸如光学字符识别或自动语音识别等序列任务的半监督培训。 我们提出一个新的损失函数 $\ uncode{x2013}$SoftCentral$\uncode{x2013}$, 这是四氯化碳的延伸, 允许同时考虑多种转录变量。 这样可以省略基于信任的过滤步骤, 而这本来是半监督学习的假标签方法的重要组成部分。 我们展示了我们方法在具有挑战性的笔迹识别任务上的有效性, 并得出结论, SoftCTC 与基于精密调整的过滤管道的性能相匹配。 我们还从计算效率的角度对SoftCT 进行了评估, 得出的结论是, 它比基于天性CTC 的多转录变量培训方法效率高得多, 我们公开了我们的 GPU 。