带有场景文字识别综合专家的纯变换器 (Pure Transformer with Integrated Experts for Scene Text Recognition) - 专知论文

会员服务 ·

0

变换 · Integration · state-of-the-art · MoDELS · 原点 ·

2022 年 11 月 9 日

Pure Transformer with Integrated Experts for Scene Text Recognition

翻译：带有场景文字识别综合专家的纯变换器

Yew Lee Tan,Adams Wai-kin Kong,Jung-Jae Kim

from arxiv, Accepted in ECCV2022

Scene text recognition (STR) involves the task of reading text in cropped images of natural scenes. Conventional models in STR employ convolutional neural network (CNN) followed by recurrent neural network in an encoder-decoder framework. In recent times, the transformer architecture is being widely adopted in STR as it shows strong capability in capturing long-term dependency which appears to be prominent in scene text images. Many researchers utilized transformer as part of a hybrid CNN-transformer encoder, often followed by a transformer decoder. However, such methods only make use of the long-term dependency mid-way through the encoding process. Although the vision transformer (ViT) is able to capture such dependency at an early stage, its utilization remains largely unexploited in STR. This work proposes the use of a transformer-only model as a simple baseline which outperforms hybrid CNN-transformer models. Furthermore, two key areas for improvement were identified. Firstly, the first decoded character has the lowest prediction accuracy. Secondly, images of different original aspect ratios react differently to the patch resolutions while ViT only employ one fixed patch resolution. To explore these areas, Pure Transformer with Integrated Experts (PTIE) is proposed. PTIE is a transformer model that can process multiple patch resolutions and decode in both the original and reverse character orders. It is examined on 7 commonly used benchmarks and compared with over 20 state-of-the-art methods. The experimental results show that the proposed method outperforms them and obtains state-of-the-art results in most benchmarks.

翻译：显微文本识别(STR) 涉及在自然场景的作物图像中读取文本的任务。 STR 中的常规模型使用的是动态神经网络(CNN),然后在编码解码器框架内使用经常性神经网络。最近,变压器结构在STR 中被广泛采用,因为它显示具有捕捉长期依赖性的强大能力,这在现场文本图像中似乎十分突出。许多研究人员使用变压器作为混杂CNN- Transerent 编码器的一部分,通常随后有一个变压器解码器。然而,这种方法仅通过编码过程利用长期依赖性中途。虽然视觉变压器(VIT)能够在早期捕捉到这种依赖性,但其利用在很大程度上在ST 中仍然未被利用。这项工作提议使用只使用变压器模型作为简单的基准, 超越了混合CNN- Transtrader模型。此外, 确定了两个关键的改进领域。首先, 最初的解码特性具有最低的预测精确度。其次, 不同的原始方位比对补码比对补码性字符特性的校准在编码中, 而PTerf- 比较的解算法只有一种固定的解变校正的解法, 在常规的解变校正的解法中, 在常规的解法中, 的解法中,这些是两个的解法的解的解算的解算法是两种方法。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

番茄果实成熟相关Dicer-like 2c的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于同步辐射的POSS-聚合物纳米复合材料微观结构与阻燃机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

巴尔通体效应蛋白BepC与宿主p53蛋白互作诱导细胞凋亡机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

BAG3在慢性淋巴细胞白血病凋亡及迁移中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

小GTP酶Rab23通过Rac1调控乳腺癌细胞迁移和侵袭的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

液相还原法制备Heusler合金纳米颗粒及其结构和性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国地区云水资源气候特征及其变化机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

SASFormer: Transformers for Sparsely Annotated Semantic Segmentation

Arxiv

0+阅读 · 2023年1月4日

Semantic Encoder Guided Generative Adversarial Face Ultra-Resolution Network

Arxiv

0+阅读 · 2023年1月3日

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Arxiv

0+阅读 · 2023年1月2日

Alignment-guided Temporal Attention for Video Action Recognition

Arxiv

0+阅读 · 2022年12月30日

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

Arxiv

0+阅读 · 2022年12月30日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Deep Face Recognition: A Survey

Deep Face Recognition: A Survey

Arxiv

18+阅读 · 2019年2月12日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

【NTU博士论文】端到端鲁棒自动语音识别的最新进展

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

SASFormer: Transformers for Sparsely Annotated Semantic Segmentation

Arxiv

0+阅读 · 2023年1月4日

Semantic Encoder Guided Generative Adversarial Face Ultra-Resolution Network

Arxiv

0+阅读 · 2023年1月3日

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Arxiv

0+阅读 · 2023年1月2日

Alignment-guided Temporal Attention for Video Action Recognition

Arxiv

0+阅读 · 2022年12月30日

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

Arxiv

0+阅读 · 2022年12月30日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Deep Face Recognition: A Survey

Deep Face Recognition: A Survey

Arxiv

18+阅读 · 2019年2月12日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

番茄果实成熟相关Dicer-like 2c的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于同步辐射的POSS-聚合物纳米复合材料微观结构与阻燃机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

巴尔通体效应蛋白BepC与宿主p53蛋白互作诱导细胞凋亡机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

BAG3在慢性淋巴细胞白血病凋亡及迁移中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

小GTP酶Rab23通过Rac1调控乳腺癌细胞迁移和侵袭的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

液相还原法制备Heusler合金纳米颗粒及其结构和性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国地区云水资源气候特征及其变化机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员