DPText-DETR:利用变换器中的动态点更好地探测场景文字 (DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer) - 专知论文

会员服务 ·

0

变换 · 稳健性 · Performer · 标注 · Better ·

2022 年 11 月 28 日

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

翻译：DPText-DETR:利用变换器中的动态点更好地探测场景文字

Maoyuan Ye,Jing Zhang,Shanshan Zhao,Juhua Liu,Bo Du,Dacheng Tao

from arxiv, Accepted to AAAI 2023

Recently, Transformer-based methods, which predict polygon points or Bezier curve control points for localizing texts, are popular in scene text detection. However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling.In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation. To address these challenges, this paper proposes a concise Dynamic Point Text DEtection TRansformer network, termed DPText-DETR. In detail, DPText-DETR directly leverages explicit point coordinates to generate position queries and dynamically updates them in a progressive way. Moreover, to improve the spatial inductive bias of non-local self-attention in Transformer, we present an Enhanced Factorized Self-Attention module which provides point queries within each instance with circular shape guidance. Furthermore, we design a simple yet effective positional label form to tackle the side effect of the previous form. To further evaluate the impact of different label forms on the detection robustness in real-world scenario, we establish an Inverse-Text test set containing 500 manually labeled images. Extensive experiments prove the high training efficiency, robustness, and state-of-the-art performance of our method on popular benchmarks. The code and the Inverse-Text test set are available at https://github.com/ymy-k/DPText-DETR.

翻译：最近,基于变压器的方法预测了多角点或用于本地化文本的Bezier曲线控制点,这些方法在现场文本检测中很受欢迎,但是,这些基于检测变压器框架的方法可能由于偏差的定位查询模型而实现低于最佳的培训效率和性能。此外,在以往工作中使用的点标签形式意味着人类的阅读顺序,这妨碍了我们观察到的检测力度。为了应对这些挑战,本文件建议采用一个简洁的动态点文本定位标签格式,称为DPText-DETR。详细来说,DPText-DETR直接利用明确点坐标生成位置查询并动态更新它们。此外,为了改进变压器中非本地自我注意的空间感化偏差偏差,我们提出了一个强化的质化自控模块,在每次查看时都提供循环形状指导的点查询。此外,我们设计了一个简单有效的位置标签格式,以解决前一种形式的侧面效果。为了进一步评估不同标签表格对真实世界情景中检测稳健度的影响,并动态更新这些坐标。此外,我们在变压系统测试基准中,我们建立了一套高端标准。

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

湖北麦冬均一多糖由PPARγ信号通路介导的降血脂作用及其机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

牙周致病菌诱导的调节性B细胞的生成及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Prp19诱导上皮间质转化促进肝癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

烯二炔化合物的离子型Bergman环化聚合反应制备共轭聚合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

高分子链间氢键解离的研究

国家自然科学基金

0+阅读 · 2012年12月31日

胆固醇酯转运蛋白在息肉状脉络膜血管病变发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

球磨、压制和烧结制备铜铟镓硒薄膜及其光电特性

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

Reusing Verification Assertions as Security Checkers for Hardware Trojan Detection

Arxiv

0+阅读 · 2023年1月30日

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

Arxiv

0+阅读 · 2023年1月30日

Text-To-4D Dynamic Scene Generation

Arxiv

0+阅读 · 2023年1月26日

Neural Dynamic Focused Topic Model

Arxiv

0+阅读 · 2023年1月26日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Rotation-Sensitive Regression for Oriented Scene Text Detection

Arxiv

13+阅读 · 2018年3月14日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Reusing Verification Assertions as Security Checkers for Hardware Trojan Detection

Arxiv

0+阅读 · 2023年1月30日

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

Arxiv

0+阅读 · 2023年1月30日

Text-To-4D Dynamic Scene Generation

Arxiv

0+阅读 · 2023年1月26日

Neural Dynamic Focused Topic Model

Arxiv

0+阅读 · 2023年1月26日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Rotation-Sensitive Regression for Oriented Scene Text Detection

Arxiv

13+阅读 · 2018年3月14日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

相关基金

湖北麦冬均一多糖由PPARγ信号通路介导的降血脂作用及其机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

牙周致病菌诱导的调节性B细胞的生成及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Prp19诱导上皮间质转化促进肝癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

烯二炔化合物的离子型Bergman环化聚合反应制备共轭聚合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

高分子链间氢键解离的研究

国家自然科学基金

0+阅读 · 2012年12月31日

胆固醇酯转运蛋白在息肉状脉络膜血管病变发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

球磨、压制和烧结制备铜铟镓硒薄膜及其光电特性

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员