SPIN: 结构保护内偏移网络,以识别场景文字文字识别 (SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition) - 专知论文

会员服务 ·

0

Extensibility · SPIN · Networking · 偏移量 · 变换 ·

2021 年 10 月 25 日

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

翻译：SPIN: 结构保护内偏移网络,以识别场景文字文字识别

Chengwei Zhang,Yunlu Xu,Zhanzhan Cheng,Shiliang Pu,Yi Niu,Fei Wu,Futai Zou

from arxiv, Accepted to AAAI21. Code is available at https://davar-lab.github.io/publication.html or https://github.com/hikopensource/DAVAR-Lab-OCR

Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts.

翻译：任意的文本外观在现场文本识别任务中构成了巨大的挑战。现有工作主要处理形状扭曲问题,包括观点扭曲、线曲曲或其他风格变异。因此,对基于空间变压器的方法进行了广泛研究。但是,对复杂场面的染色体困难没有给予多少重视。在这项工作中,我们引入了一个新的可学习的与几何无关的模块,即结构保护内部偏移网络(SPIN),它允许网络内部源数据进行色控。在任何识别结构之前可以插入这一不同的模块,以缓解下游任务,使神经网络有能力积极转换输入强度,而不是现有的空间整化。它也可以作为已知的空间变换和独立合作方式工作的一个补充模块。广泛的实验表明,使用SPIN的结果是大大改进了多个文本识别基准,与艺术现状相比。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

一文读懂依存句法分析

一文读懂依存句法分析

AINLP

16+阅读 · 2019年4月28日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Arbitrarily-Oriented Text Recognition

Arxiv

3+阅读 · 2017年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

一文读懂依存句法分析

一文读懂依存句法分析

AINLP

16+阅读 · 2019年4月28日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Arbitrarily-Oriented Text Recognition

Arxiv

3+阅读 · 2017年11月12日

微信扫码咨询专知VIP会员