Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves state-of-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-research-datasets/hiertext and https://github.com/tensorflow/models/tree/master/official/projects/unified_detector.
翻译:长期以来,在不同图像域中,场景文本检测和文件布局分析一直被视为两个不同的任务。在本文中,我们把它们汇集在一起,介绍统一的场景文本检测和布局分析任务。引入了第一个等级的场景文本数据集,以完成这一新的研究任务。我们还提出了一个新颖的方法,能够同时检测场景文本并以统一的方式形成文本群。全面实验表明,我们的统一模型比设计完善的多个基线方法取得更好的性能。此外,这一模型在多个场景文本检测数据集中取得了最新的结果,而不需要复杂的后处理。数据集和代码:https://github.com/google-research-datasets/hiertext 和 https://gitub.com/tensorflow/modelgles/tree/master/mance/offic/ project/unific_deterctor。