We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector). Different from recent advanced text detectors that used complicated post-processing and hand-crafted network architectures, resulting in low inference speed, FAST has two new designs. (1) We design a minimalist kernel representation (only has 1-channel output) to model text with arbitrary shape, as well as a GPU-parallel post-processing to efficiently assemble text lines with a negligible time overhead. (2) We search the network architecture tailored for text detection, leading to more powerful features than most networks that are searched for image classification. Benefiting from these two designs, FAST achieves an excellent trade-off between accuracy and efficiency on several challenging datasets, including Total Text, CTW1500, ICDAR 2015, and MSRA-TD500. For example, FAST-T yields 81.6% F-measure at 152 FPS on Total-Text, outperforming the previous fastest method by 1.7 points and 70 FPS in terms of accuracy and speed. With TensorRT optimization, the inference speed can be further accelerated to over 600 FPS. Code and models will be released at https://github.com/czczup/FAST.
翻译:我们提出了一个准确而高效的现场文本检测框架,称为FAST(即,更快的任意型文本检测器),不同于使用复杂的后处理和手工艺网络结构的最近先进的文本检测器,后者使用复杂的后处理和手工艺网络结构,导致低推推率速度,因此,FAST有两个新的设计。 (1) 我们设计了一个最小的内核代表器(只有1个通道输出器),以任意制成文本,以及一个GPU-平行后处理器,以高效组装文本线,其管理时间不多。 (2) 我们搜索专门为文本检测设计的网络结构,其功能比大多数搜索图像分类的网络更强大。从这两个设计中受益,FAST在几个具有挑战性的数据集,包括总文本、CTW1500、ICD 2015和MSRA-TD500,在精确度和速度方面实现精度和效率之间的极佳的交换。例如,FAST-TAST-T在TFPS上显示81.