Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
翻译:由于联合优化文本检测和识别部分的好处,文献中最近注意到了显示端到端方法的文本。现有方法通常在检测和识别分支之间有明显的区分,需要为这两项任务提供精确的说明。我们引入了TextTranSpotter(TTS),一种基于变压器的文本检测方法和第一个文本检测框架,可以同时以完全和微弱的监管环境进行培训。通过学习每个字的单个潜在代表,并利用基于匈牙利损失的新的损失函数,我们的方法减轻了对昂贵的本地化说明的需求。通过只对真实数据进行文字描述说明的培训,我们微弱的监管方法以以往最先进的完全监督的方法实现了竞争性的绩效。在以完全监督的方式进行培训时,TextTranSpotter在多个基准上展示了最先进的结果。