VOLTA: 愿景-语言变形器,具有弱力超强本地-特性调整功能 (VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment) - 专知论文

会员服务 ·

0

变换 · 可约的 · Extensibility · Performer · 值域 ·

2023 年 2 月 15 日

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

翻译：VOLTA: 愿景-语言变形器,具有弱力超强本地-特性调整功能

Shraman Pramanick,Li Jing,Sayan Nag,Jiachen Zhu,Hardik Shah,Yann LeCun,Rama Chellappa

Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accurate bounding box annotations are expensive to collect and use for supervision at scale. In this work, we propose VoLTA (Vision-Language Transformer with weakly-supervised local-feature Alignment), a new VLP paradigm that only utilizes image-caption data but achieves fine-grained region-level image understanding, eliminating the use of expensive box annotations. VoLTA adopts graph optimal transport-based weakly-supervised alignment on local image patches and text tokens to germinate an explicit, self-normalized, and interpretable low-level matching criterion. In addition, VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre-training and removes fusion-specific transformer layers, further reducing memory requirements. Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.

翻译：然而,大多数现有的端到端VLP方法都使用高分辨率图像文本框数据,以很好地完成细微的区域一级任务,例如物体检测、分解和参考表达理解。不幸的是,具有准确的捆绑框说明的高分辨率图像收集和用于规模监督的费用很高。在这项工作中,我们提议VOLTA(视野-拉普吉变异器,其地方性能调整不力监督的本地性能调整),新的VLP模式通常只使用高清晰度图像文本框数据,但只达到精细化的区域级图像理解,不再使用昂贵的框说明。VOLTA采用基于本地图像补接合和文本符号的优化运输最优性,以强化清晰、自我调整和可解释的低级匹配标准。此外,VOLTA将多模式更深植于单一模式的本地性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性下基体,在前常规性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性地在前测试中,在前试验中,在前水平上去除硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性硬性要求上取消。

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

还原型谷胱甘肽缓解玉米镉毒害机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

玉米β-胡萝卜素羟化酶2基因种子特异性表达的调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

常染色体隐性遗传小脑性共济失调新的致病基因CAX的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用含氮碳源合成B,N分离共掺杂晶态纳米碳及其氧还原反应催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水稻WRKY80转录因子基因在抗病防御过程中的调节功能与分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

自载型有序介孔非贵金属-氮-碳燃料电池阴极氧还原催化材料

国家自然科学基金

0+阅读 · 2013年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米大斑病菌水甘油通道蛋白StFps1基因的克隆与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

大额牛瘤胃细菌宏基因组BAC文库的构建及高活性纤维素酶基因的克隆与表达

国家自然科学基金

0+阅读 · 2009年12月31日

From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection

Arxiv

0+阅读 · 2023年4月6日

Self-Supervised Image Denoising for Real-World Images with Context-aware Transformer

Arxiv

0+阅读 · 2023年4月4日

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

Arxiv

0+阅读 · 2023年4月3日

Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Arxiv

0+阅读 · 2023年4月3日

Rethinking Local Perception in Lightweight Vision Transformer

Arxiv

0+阅读 · 2023年4月3日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

相关论文

From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection

Arxiv

0+阅读 · 2023年4月6日

Self-Supervised Image Denoising for Real-World Images with Context-aware Transformer

Arxiv

0+阅读 · 2023年4月4日

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

Arxiv

0+阅读 · 2023年4月3日

Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Arxiv

0+阅读 · 2023年4月3日

Rethinking Local Perception in Lightweight Vision Transformer

Arxiv

0+阅读 · 2023年4月3日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

还原型谷胱甘肽缓解玉米镉毒害机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

玉米β-胡萝卜素羟化酶2基因种子特异性表达的调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

常染色体隐性遗传小脑性共济失调新的致病基因CAX的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用含氮碳源合成B,N分离共掺杂晶态纳米碳及其氧还原反应催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水稻WRKY80转录因子基因在抗病防御过程中的调节功能与分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

自载型有序介孔非贵金属-氮-碳燃料电池阴极氧还原催化材料

国家自然科学基金

0+阅读 · 2013年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米大斑病菌水甘油通道蛋白StFps1基因的克隆与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

大额牛瘤胃细菌宏基因组BAC文库的构建及高活性纤维素酶基因的克隆与表达

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员