SPS v2:单点场景文本斑点 (SPTS v2: Single-Point Scene Text Spotting) - 专知论文

会员服务 ·

0

解码 · 示例 · Extensibility · 相同 · INTERACT ·

2023 年 1 月 4 日

SPTS v2: Single-Point Scene Text Spotting

翻译：SPS v2:单点场景文本斑点

Yuliang Liu,Jiaxin Zhang,Dezhi Peng,Mingxin Huang,Xinyu Wang,Jingqun Tang,Can Huang,Dahua Lin,Chunhua Shen,Xiang Bai,Lianwen Jin

from arxiv, arXiv admin note: text overlap with arXiv:2112.07917

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles,and polygons as a prerequisite, which are much more expensive than using single-point. For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost single-point annotation by the proposed framework, termed SPTS v2. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 14x faster inference speed. Most importantly, within the scope of our SPTS v2, extensive experiments further reveal an important phenomenon that single-point serves as the optimal setting for the scene text spotting compared to non-point, rectangular bounding box, and polygonal bounding box. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at https://github.com/shannanyinxiang/SPTS.

翻译：端到端的场景文本检测由于文本检测和识别之间的内在协同作用而取得了显著进展。以往的方法通常将横向矩形、旋转矩形、夸德罗列和多边形等手动说明视为比使用单点更昂贵的前提条件。我们第一次证明, 培训场景文本检测模型可以通过拟议框架的极低成本单点注释实现, 称为 SPBS v2. CTOS v2 保存自动回溯式变异器的优势。通过连续预测同一预测序列内所有文本事件的中心点, 而平行识别解码(PRD)则比使用单点要贵得多。这两个解coder共享相同的参数, 并且与简单而有效的单点信息传输程序互动。对各种现有基准数据集的全面实验表明, CTBS v2 能够超越先前的州- 单点文本显示器( IADD) 的优势, 连续预测同一预测序列内所有文本的中点, 而同时实现14x 边的平行解码(PRD), SP 提供更快速的场景端点的场景空间测试速度。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【资源】文本风格迁移相关资源汇总

【资源】文本风格迁移相关资源汇总

专知

13+阅读 · 2020年7月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

变分方法与非线性偏微分方程前沿问题

国家自然科学基金

2+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

图的能量与排序问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

抽象时滞发展方程周期解的存在性及渐近性态

国家自然科学基金

0+阅读 · 2012年12月31日

miR-125b相关ceRNAs网络调控乳腺癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

X&Fuse: Fusing Visual Information in Text-to-Image Generation

Arxiv

0+阅读 · 2023年3月2日

A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

Arxiv

0+阅读 · 2023年2月28日

Plan-then-Seam: Towards Efficient Table-to-Text Generation

Arxiv

0+阅读 · 2023年2月28日

Turning a CLIP Model into a Scene Text Detector

Arxiv

0+阅读 · 2023年2月28日

RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention

Arxiv

0+阅读 · 2023年2月28日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Rotation-Sensitive Regression for Oriented Scene Text Detection

Arxiv

13+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

【资源】文本风格迁移相关资源汇总

【资源】文本风格迁移相关资源汇总

专知

13+阅读 · 2020年7月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

X&Fuse: Fusing Visual Information in Text-to-Image Generation

Arxiv

0+阅读 · 2023年3月2日

A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

Arxiv

0+阅读 · 2023年2月28日

Plan-then-Seam: Towards Efficient Table-to-Text Generation

Arxiv

0+阅读 · 2023年2月28日

Turning a CLIP Model into a Scene Text Detector

Arxiv

0+阅读 · 2023年2月28日

RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention

Arxiv

0+阅读 · 2023年2月28日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Rotation-Sensitive Regression for Oriented Scene Text Detection

Arxiv

13+阅读 · 2018年3月14日

相关基金

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

变分方法与非线性偏微分方程前沿问题

国家自然科学基金

2+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

图的能量与排序问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

抽象时滞发展方程周期解的存在性及渐近性态

国家自然科学基金

0+阅读 · 2012年12月31日

miR-125b相关ceRNAs网络调控乳腺癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员