OOV-STR的视野-语言适应性相互编码器 (Vision-Language Adaptive Mutual Decoder for OOV-STR) - 专知论文

会员服务 ·

0

解码 · Learning · ECCV 2022 · Attention · Performer ·

2022 年 9 月 2 日

Vision-Language Adaptive Mutual Decoder for OOV-STR

翻译：OOV-STR的视野-语言适应性相互编码器

Jinshui Hu,Chenyu Liu,Qiandong Yan,Xuyang Zhu,Fengli yu,Jiajia Wu,Bing Yin

from arxiv, 1st Place Solution to ECCV 2022 OOV-ST Challenge

Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings. Inspired by the intuition that the learned language prior have limited OOV preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle OOV problems partly. VLAMD consists of three main conponents. Firstly, we build an attention based LSTM decoder with two adaptively merged visual-only modules, yields a vision-language balanced main branch. Secondly, we add an auxiliary query based autoregressive transformer decoding head for common visual and language prior representation learning. Finally, we couple these two designs with bidirectional training for more diverse language modeling, and do mutual sequential decoding to get robuster results. Our approach achieved 70.31\% and 59.61\% word accuracy on IV+OOV and OOV settings respectively on Cropped Word Recognition Task of OOV-ST Challenge at ECCV 2022 TiE Workshop, where we got 1st place on both settings.

翻译：最近的工作显示,在词汇(IV)现场文字识别中共同的深层次学习模式取得了巨大成功;然而,在现实世界情景中,校外字非常重要,SOTA识别模式在OOV设置中通常效果不佳。受以下直觉的启发,即以前学习的语言对OOOV的预发型作用有限,我们设计了一个称为“视觉语言适应性适应性兼容共解码器(VLAMD)”的框架,以解决OOOV问题。VLAMD由三个主要响应方组成。首先,我们用两个适应性合并的视觉单一模块建立一个基于LSTM解码器的注意点,产生一个视觉平衡的主要分支。第二,我们增加了一个基于自动递增变异变变器在通用视觉和语言之前的演示学习中解码头的辅助查询。最后,我们将这两个设计与关于更多样化语言建模的双向培训结合起来,并进行相继解码以获得更稳健的结果。我们的方法在IV+OV和OVE设置中实现了70.31 ⁇ 59.61字精准,产生视觉平衡。我们分别在EC Ti22 TI22 的作物识别识别任务中,我们在这里都获得了了OV.

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

62+阅读 · 2020年1月10日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

p75NTR基因rs2072446多态性所致错义突变在阿尔茨海默病发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

人巨细胞病毒潜伏感染的自噬调控及相关IE2-Akt-Beclin 1通路的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

CRMP4转录调控与前列腺癌转移机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNAs对TLR4介导的心肌缺血再灌注损伤的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯量子点的掺杂研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous Learning

Arxiv

0+阅读 · 2022年10月18日

A Unified Sequence Interface for Vision Tasks

Arxiv

0+阅读 · 2022年10月16日

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

Arxiv

0+阅读 · 2022年10月15日

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning

Arxiv

0+阅读 · 2022年10月14日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Transfer Adaptation Learning: A Decade Survey

Transfer Adaptation Learning: A Decade Survey

Arxiv

37+阅读 · 2019年3月12日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

62+阅读 · 2020年1月10日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous Learning

Arxiv

0+阅读 · 2022年10月18日

A Unified Sequence Interface for Vision Tasks

Arxiv

0+阅读 · 2022年10月16日

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

Arxiv

0+阅读 · 2022年10月15日

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning

Arxiv

0+阅读 · 2022年10月14日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Transfer Adaptation Learning: A Decade Survey

Transfer Adaptation Learning: A Decade Survey

Arxiv

37+阅读 · 2019年3月12日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

p75NTR基因rs2072446多态性所致错义突变在阿尔茨海默病发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

人巨细胞病毒潜伏感染的自噬调控及相关IE2-Akt-Beclin 1通路的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

CRMP4转录调控与前列腺癌转移机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNAs对TLR4介导的心肌缺血再灌注损伤的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯量子点的掺杂研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员