利用转型逆差和平等实现自我监督的稳妥本地化 (Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation) - 专知论文

会员服务 ·

0

Learning · 等变 · 变换 · Extensibility · 不变 ·

2022 年 6 月 26 日

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

翻译：利用转型逆差和平等实现自我监督的稳妥本地化

Jinxiang Liu,Chen Ju,Weidi Xie,Ya Zhang

from arxiv, 10 pages,

We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of data augmentations, and reveal that (1) composition of data augmentations plays a critical role, {\em i.e.}~explicitly encouraging the audio-visual representations to be invariant to various transformations~({\em transformation invariance}); (2) enforcing geometric consistency substantially improves the quality of learned representations, {\em i.e.}~the detected sound source should follow the same transformation applied on input video frames~({\em transformation equivariance}). Extensive experiments demonstrate that our model significantly outperforms previous methods on two sound localization benchmarks, namely, Flickr-SoundNet and VGG-Sound. Additionally, we also evaluate audio retrieval and cross-modal retrieval tasks. In both cases, our self-supervised models demonstrate superior retrieval performances, even competitive with the supervised approach in audio retrieval. This reveals the proposed framework learns strong multi-modal representations that are beneficial to sound localisation and generalization to further applications. \textit{All codes will be available}.

翻译：我们为视听代表制学习提供了一个简单而有效的自我监督框架,使声音源在视频中本地化。为了了解能够学习有用的表达方式,我们系统地调查数据增强的效果,并揭示:(1)数据增强的构成发挥着关键的作用,即明确鼓励视听表达方式对各种变换不起作用;(2)实施几何一致性,大大提高了学习的表达方式的质量,也就是说,被检测到的音频源应当遵循输入视频框架~(用户变换等)的相同转换方式。广泛的实验表明,我们的模式大大优于先前关于两个声音化基准的方法,即Flickr-SoundNet和VGG-Sound。此外,我们还评估音频检索和跨模式检索任务。在这两种情况下,我们自我控制的模型都显示了优异的检索功能,甚至与音频检索的监督下方法具有竞争力。这显示了拟议的框架学会了强大的多模式表达方式,有利于健全的本地化和通用的应用程序。{Alltretrol}

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

形状记忆合金对FePt复合膜磁性能的非易失性调控特征及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

磷酸化的OPN激活CALD聚合体的分子机制在无功能垂体腺瘤侵袭与复发中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

线粒体TRAP1抑制肾小管上皮细胞凋亡在肾间质纤维化中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

几何阻挫系统中的新颖量子磁性和超导电性

国家自然科学基金

0+阅读 · 2012年12月31日

超临界CO2中催化剂合成及催化加氢反应共性问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

Unsupervised Local Discrimination for Medical Images

Arxiv

0+阅读 · 2022年8月17日

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search

Arxiv

0+阅读 · 2022年8月12日

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Arxiv

0+阅读 · 2022年8月12日

Anomaly segmentation model for defects detection in electroluminescence images of heterojunction solar cells

Arxiv

0+阅读 · 2022年8月11日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Self-supervised Geometric Perception

Arxiv

24+阅读 · 2021年3月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Unsupervised Local Discrimination for Medical Images

Arxiv

0+阅读 · 2022年8月17日

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search

Arxiv

0+阅读 · 2022年8月12日

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Arxiv

0+阅读 · 2022年8月12日

Anomaly segmentation model for defects detection in electroluminescence images of heterojunction solar cells

Arxiv

0+阅读 · 2022年8月11日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Self-supervised Geometric Perception

Arxiv

24+阅读 · 2021年3月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

形状记忆合金对FePt复合膜磁性能的非易失性调控特征及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

磷酸化的OPN激活CALD聚合体的分子机制在无功能垂体腺瘤侵袭与复发中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

线粒体TRAP1抑制肾小管上皮细胞凋亡在肾间质纤维化中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

几何阻挫系统中的新颖量子磁性和超导电性

国家自然科学基金

0+阅读 · 2012年12月31日

超临界CO2中催化剂合成及催化加氢反应共性问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员