Segmentation-based scene text detection methods have been widely adopted for arbitrary-shaped text detection recently, since they make accurate pixel-level predictions on curved text instances and can facilitate real-time inference without time-consuming processing on anchors. However, current segmentation-based models are unable to learn the shapes of curved texts and often require complex label assignments or repeated feature aggregations for more accurate detection. In this paper, we propose RSCA: a Real-time Segmentation-based Context-Aware model for arbitrary-shaped scene text detection, which sets a strong baseline for scene text detection with two simple yet effective strategies: Local Context-Aware Upsampling and Dynamic Text-Spine Labeling, which model local spatial transformation and simplify label assignments separately. Based on these strategies, RSCA achieves state-of-the-art performance in both speed and accuracy, without complex label assignments or repeated feature aggregations. We conduct extensive experiments on multiple benchmarks to validate the effectiveness of our method. RSCA-640 reaches 83.9% F-measure at 48.3 FPS on CTW1500 dataset.
翻译:最近对任意形状的文本检测广泛采用了基于分层的现场文本检测方法,因为这些方法对曲线的文本实例作了准确的像素级预测,并且可以促进实时推断,而无需在锚上进行耗时处理。然而,目前基于分层的模型无法了解曲线文本的形状,往往需要复杂的标签任务或重复的特征聚合才能更准确地检测。在本文件中,我们提议了RSCA:一个基于实时分层的任意形状文本检测背景软件模型,该模型为现场文本检测设定了强有力的基准,有两种简单而有效的战略:当地环境软件的升级和动态的文本光谱拉贝,这两个战略是分别模拟当地空间转换和简化标签任务。根据这些战略, RSCA实现了速度和准确的状态性能,没有复杂的标签任务或重复的特征汇总。我们就多种基准进行了广泛的实验,以验证我们的方法的有效性。RSCA-640在48.3 FPS数据集的48.3 FPS上达到83-9 % F-测量。