BiCro: 通过双向跨模态相似度一致性的噪声对齐纠正来进行多模态数据的匹配 (BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency) - 专知论文

会员服务 ·

0

相似度 · MoDELS · 估计/估计量 · 多峰值 · SOFT ·

2023 年 3 月 22 日

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

翻译：BiCro: 通过双向跨模态相似度一致性的噪声对齐纠正来进行多模态数据的匹配

Shuo Yang,Zhaopan Xu,Kai Wang,Yang You,Hongxun Yao,Tongliang Liu,Min Xu

from arxiv, CVPR 2023

As one of the most fundamental techniques in multimodal learning, cross-modal matching aims to project various sensory modalities into a shared feature space. To achieve this, massive and correctly aligned data pairs are required for model training. However, unlike unimodal datasets, multimodal datasets are extremely harder to collect and annotate precisely. As an alternative, the co-occurred data pairs (e.g., image-text pairs) collected from the Internet have been widely exploited in the area. Unfortunately, the cheaply collected dataset unavoidably contains many mismatched data pairs, which have been proven to be harmful to the model's performance. To address this, we propose a general framework called BiCro (Bidirectional Cross-modal similarity consistency), which can be easily integrated into existing cross-modal matching models and improve their robustness against noisy data. Specifically, BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree. The basic idea of BiCro is motivated by that -- taking image-text matching as an example -- similar images should have similar textual descriptions and vice versa. Then the consistency of these two similarities can be recast as the estimated soft labels to train the matching model. The experiments on three popular cross-modal matching datasets demonstrate that our method significantly improves the noise-robustness of various matching models, and surpass the state-of-the-art by a clear margin.

翻译：作为多模态学习中最基本的技术之一，跨模态匹配旨在将各种感觉模态投影到共享特征空间中。为了实现这一目标，需要大量正确对齐的数据对进行模型训练。然而，与单模态数据集不同，多模态数据集极其难以精确地收集和注释。作为替代，从互联网收集到的共现数据对（例如图像-文本对）已被广泛地用于此领域。不幸的是，便宜收集的数据集不可避免地包含许多不匹配的数据对，这已被证明会对模型的性能造成不利影响。为了解决这个问题，我们提出了一个通用框架，称为BiCro（双向跨模态相似度一致性），可以轻松地集成到现有的跨模态匹配模型中，并提高其对噪声数据的鲁棒性。具体而言，BiCro旨在为噪声数据对估计软标签，以反映它们的真实对应度。 BiCro的基本思想是：以图像-文本匹配为例，相似的图像应具有相似的文本描述，反之亦然。然后，可以重构这两个相似性的一致性，作为训练匹配模型的估计软标签。在三个流行的跨模态匹配数据集实验中，我们的方法显著提高了各种匹配模型的噪声鲁棒性，并超过了最新技术水平。

0

相关内容

相似度

【CIKM2022】基于双向Transformers对比学习的序列推荐

【CIKM2022】基于双向Transformers对比学习的序列推荐

专知会员服务

21+阅读 · 2022年8月10日

视频自监督学习综述

视频自监督学习综述

专知会员服务

53+阅读 · 2022年7月5日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

哈工大最新《自然语言处理数据增强方法》综述论文，155页pdf阐述复述、噪声和抽样三大数据增强方法

专知会员服务

47+阅读 · 2021年10月16日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知会员服务

32+阅读 · 2020年3月30日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【CIKM2022】基于双向Transformers对比学习的序列推荐

【CIKM2022】基于双向Transformers对比学习的序列推荐

专知

6+阅读 · 2022年8月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

基于能量与信息共传的动态自适应光接入系统研究

国家自然科学基金

0+阅读 · 2014年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于网上弱标注数据的个性化图像标注研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于广义稀疏表示的异质人脸图像变换和质量评价

国家自然科学基金

0+阅读 · 2011年12月31日

基于微波光子学信号处理的超快飞秒测距激光雷达

国家自然科学基金

0+阅读 · 2011年12月31日

多源遥感数据反演农作物叶面积指数中的冠层模型改进与信息量评价方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

Universal Source Separation with Weakly Labelled Data

Arxiv

0+阅读 · 2023年5月11日

Synthesizing Conjunctive Queries for Code Search

Arxiv

0+阅读 · 2023年5月11日

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

Arxiv

0+阅读 · 2023年5月10日

Towards Effective Visual Representations for Partial-Label Learning

Arxiv

0+阅读 · 2023年5月10日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【CIKM2022】基于双向Transformers对比学习的序列推荐

【CIKM2022】基于双向Transformers对比学习的序列推荐

专知会员服务

21+阅读 · 2022年8月10日

视频自监督学习综述

视频自监督学习综述

专知会员服务

53+阅读 · 2022年7月5日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

哈工大最新《自然语言处理数据增强方法》综述论文，155页pdf阐述复述、噪声和抽样三大数据增强方法

专知会员服务

47+阅读 · 2021年10月16日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知会员服务

32+阅读 · 2020年3月30日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

【CIKM2022】基于双向Transformers对比学习的序列推荐

【CIKM2022】基于双向Transformers对比学习的序列推荐

专知

6+阅读 · 2022年8月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

相关论文

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

Universal Source Separation with Weakly Labelled Data

Arxiv

0+阅读 · 2023年5月11日

Synthesizing Conjunctive Queries for Code Search

Arxiv

0+阅读 · 2023年5月11日

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

Arxiv

0+阅读 · 2023年5月10日

Towards Effective Visual Representations for Partial-Label Learning

Arxiv

0+阅读 · 2023年5月10日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

相关基金

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

基于能量与信息共传的动态自适应光接入系统研究

国家自然科学基金

0+阅读 · 2014年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于网上弱标注数据的个性化图像标注研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于广义稀疏表示的异质人脸图像变换和质量评价

国家自然科学基金

0+阅读 · 2011年12月31日

基于微波光子学信号处理的超快飞秒测距激光雷达

国家自然科学基金

0+阅读 · 2011年12月31日

多源遥感数据反演农作物叶面积指数中的冠层模型改进与信息量评价方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员