End-to-end text spotting has attached great attention recently due to its benefits on global optimization and high maintainability for real applications. However, the input scale has always been a tough trade-off since recognizing a small text instance usually requires enlarging the whole image, which brings high computational costs. In this paper, to address this problem, we propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework, which aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. Concretely, we adopt a resolution selector to dynamically decide the input resolutions for different images, which is constraint by both inference accuracy and computational cost. Another sequential knowledge distillation strategy is conducted on the text recognition branch, making the low-res input obtains comparable performance to a high-res image. The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability. Extensive experiments on several text spotting benchmarks show that the proposed method vastly improves the usability of low-res models. The code is available at https://github.com/hikopensource/DAVAR-Lab-OCR/.
翻译:最近,由于输入比例表对全球优化的好处和真实应用的高度维护性的好处,对端对端文本定位最近给予了极大关注。然而,输入比例表始终是一个艰难的权衡,因为承认一个小文本实例通常要求扩大整个图像,从而带来很高的计算成本。在本文件中,为了解决这一问题,我们提议了一个新的成本效率高的动态低分辨率蒸馏(DLD)文本定位框架,目的是从不同的小但可识别的分辨率中推断图像,并在准确性和效率之间实现更好的平衡。具体地说,我们通过一个分辨率选择器,以动态方式决定不同图像的输入分辨率,这是由推断准确性和计算成本两者制约的。另一个顺序知识蒸馏战略是在文本识别分支上进行,使低文本输入获得与高分辨率图像相似的性能。拟议方法可以优化端对端,并在任何现有文本定位框架中采用,以提高实用性。若干文本标识基准的广泛实验表明,拟议的方法大大改进了低可读性模型的易读性。该代码可在 http://L/CRUB/AURSouras/AVAR/CRIFF O上查阅。