【CVPR 2022】通过动态梯度调制平衡视听学习，Balanced Audio-visual Learning via On-the-fly Gradient Modulation - 专知VIP

会员服务 ·

0

CVPR 2022 · 视听学习 · 多模态 · 动态梯度调制 · 中国人民大学高瓴人工智能学院 ·

2022 年 3 月 12 日

【CVPR 2022】通过动态梯度调制平衡视听学习，Balanced Audio-visual Learning via On-the-fly Gradient Modulation

专知会员服务

专知，提供专业可信的知识分发服务，让认知协作更快更好！

论文题目：Balanced Audio-visual Learning via On-the-fly Gradient Modulation

作者：彭小康*，卫雅珂*，邓安东，王栋，胡迪

通讯作者：胡迪

论文概述：视听学习通过整合不同的感官，有助于全面了解世界。因此，多输入模态有望提高模型性能，但我们实际上发现即使多模态模型优于其单模态模型，它们也没有得到充分利用。具体来说，在本文中，我们指出现有的视听判别模型（其中为所有模态设计了统一的目标）可能仍然存在欠优化的单模态表示，这是由某些场景中的另一种主导模态引起的。为了缓解这种优化不平衡，我们提出了动态梯度调制，通过监控它们对学习目标的贡献的差异来自适应地控制每种模态的优化。

此外，引入了动态变化的额外高斯噪声，以避免梯度调制引起的泛化下降。因此，我们在不同的视听任务上实现了对普通融合方法的相当大的改进，这种简单的策略也可以提升现有的多模态方法，这说明了它的有效性和多功能性。

成为VIP会员查看完整内容

9

相关内容

CVPR 2022

CVPR 2022 将于2022年 6 月 21-24 日在美国的新奥尔良举行。CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写，即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议，会议的主要内容是计算机视觉与模式识别技术。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【CVPR 2022】学习在动态视听情境中回答问题，Learning to Answer Questions in Dynamic Audio-Visual Scenarios

【CVPR 2022】学习在动态视听情境中回答问题，Learning to Answer Questions in Dynamic Audio-Visual Scenarios

专知会员服务

8+阅读 · 2022年3月12日

【ICCV2021】自监督蒸馏的长尾视觉识别

专知会员服务

24+阅读 · 2021年9月16日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

专知会员服务

24+阅读 · 2019年12月30日

CVPR 2022 Oral | 全新视觉Transformer主干！NUS&字节跳动提出Shunted Transformer

CVPR 2022 Oral | 全新视觉Transformer主干！NUS&字节跳动提出Shunted Transformer

CVer

0+阅读 · 2022年4月6日

TPAMI 2021｜VideoDG:首个视频领域泛化模型

TPAMI 2021｜VideoDG:首个视频领域泛化模型

专知

0+阅读 · 2021年12月31日

Arxiv'21 | Graph Federated Learning

Arxiv'21 | Graph Federated Learning

图与推荐

0+阅读 · 2021年11月17日

NeuralPS'20 | Graph Meta Learning via Local Subgraphs

NeuralPS'20 | Graph Meta Learning via Local Subgraphs

图与推荐

3+阅读 · 2021年10月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

面向智能视觉监控的大规模慢特征学习研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于主动增量式学习的故障诊断知识挖掘方法

国家自然科学基金

2+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于视觉注意计算模型和粒子群优化的高分辨率遥感影像目标识别研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于半监督集成学习的不平衡数据研究

国家自然科学基金

0+阅读 · 2012年12月31日

Imbalanced Classification via a Tabular Translation GAN

Arxiv

0+阅读 · 2022年4月19日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

VIP会员

相关主题

动态梯度调制

中国人民大学高瓴人工智能学院

相关VIP内容

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【CVPR 2022】学习在动态视听情境中回答问题，Learning to Answer Questions in Dynamic Audio-Visual Scenarios

【CVPR 2022】学习在动态视听情境中回答问题，Learning to Answer Questions in Dynamic Audio-Visual Scenarios

专知会员服务

8+阅读 · 2022年3月12日

【ICCV2021】自监督蒸馏的长尾视觉识别

专知会员服务

24+阅读 · 2021年9月16日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

专知会员服务

24+阅读 · 2019年12月30日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

CVPR 2022 Oral | 全新视觉Transformer主干！NUS&字节跳动提出Shunted Transformer

CVPR 2022 Oral | 全新视觉Transformer主干！NUS&字节跳动提出Shunted Transformer

CVer

0+阅读 · 2022年4月6日

TPAMI 2021｜VideoDG:首个视频领域泛化模型

TPAMI 2021｜VideoDG:首个视频领域泛化模型

专知

0+阅读 · 2021年12月31日

Arxiv'21 | Graph Federated Learning

Arxiv'21 | Graph Federated Learning

图与推荐

0+阅读 · 2021年11月17日

NeuralPS'20 | Graph Meta Learning via Local Subgraphs

NeuralPS'20 | Graph Meta Learning via Local Subgraphs

图与推荐

3+阅读 · 2021年10月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

相关基金

面向智能视觉监控的大规模慢特征学习研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于主动增量式学习的故障诊断知识挖掘方法

国家自然科学基金

2+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于视觉注意计算模型和粒子群优化的高分辨率遥感影像目标识别研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于半监督集成学习的不平衡数据研究

国家自然科学基金

0+阅读 · 2012年12月31日

相关论文

Imbalanced Classification via a Tabular Translation GAN

Arxiv

0+阅读 · 2022年4月19日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

微信扫码咨询专知VIP会员