Feature extractor plays a critical role in text recognition (TR), but customizing its architecture is relatively less explored due to expensive manual tweaking. In this work, inspired by the success of neural architecture search (NAS), we propose to search for suitable feature extractors. We design a domain-specific search space by exploring principles for having good feature extractors. The space includes a 3D-structured space for the spatial model and a transformed-based space for the sequential model. As the space is huge and complexly structured, no existing NAS algorithms can be applied. We propose a two-stage algorithm to effectively search in the space. In the first stage, we cut the space into several blocks and progressively train each block with the help of an auxiliary head. We introduce the latency constraint into the second stage and search sub-network from the trained supernet via natural gradient descent. In experiments, a series of ablation studies are performed to better understand the designed space, search algorithm, and searched architectures. We also compare the proposed method with various state-of-the-art ones on both hand-written and scene TR tasks. Extensive results show that our approach can achieve better recognition performance with less latency.
翻译:在文本识别(TR)中,地物提取器发挥着关键作用,但由于人工调整成本昂贵,对其结构进行定制的探索相对较少。在这项工作中,由于神经结构搜索的成功(NAS),我们建议寻找合适的地物提取器。我们设计了一个特定域的搜索空间,探索使用良好的地物提取器的原则。空间包括空间模型的3D结构空间和相继模型的改造空间。由于空间巨大且结构复杂,因此无法应用现有的NAS算法。我们建议了在空间进行有效搜索的两阶段算法。在第一阶段,我们将空间切入几个区块,并在辅助头的帮助下逐步培训每个区块。我们将拉伸限制引入了第二阶段,并通过自然梯度下降从受过训练的超级网络搜索子网络。在实验中,进行了一系列的模拟研究,以更好地了解设计的空间、搜索算法和搜索结构。我们还将拟议的方法与手写和场景TR任务中的各种状态方法进行了比较。广泛的结果显示,我们能够以较低程度的方式实现更好的认识。