视听部分 (Audio-Visual Segmentation) - 专知论文

会员服务 ·

0

AVS · INTERACT · Guidance · 正则化项 · HTTPS ·

2022 年 7 月 11 日

Audio-Visual Segmentation

翻译：视听部分

Jinxing Zhou,Jianyuan Wang,Jiayi Zhang,Weixuan Sun,Jing Zhang,Stan Birchfield,Dan Guo,Lingpeng Kong,Meng Wang,Yiran Zhong

from arxiv, Accepted to ECCV 2022; Jinxing Zhou and Jianyuan Wang contributed equally; Meng Wang and Yiran Zhong are corresponding authors; Code is available at https://github.com/OpenNLPLab/AVSBench

We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics. Code is available at https://github.com/OpenNLPLab/AVSBench.

翻译：我们提议探讨一个新的问题,即视听部分(AVS),其目标是输出在图像框架时产生声音的物体像素级图。为了便利这一研究,我们建造了第一个视听部分基准(AVSBench),为听觉视频中的声音对象提供像素解说。根据这个基准,我们研究了两个设置:1)半监督的视听部分,有一个单一声音源,2)完全监督的视听部分,有多个声音源。为了处理AVS问题,我们提议了一种新颖的方法,使用时间性像素的视听互动模块输入音频部分,作为视觉部分过程的指导。我们还设计了一种正规化的损失,以鼓励在培训期间进行视听制图。关于AVSBench的定量和定性实验将我们的方法与相关任务中的若干现有方法进行比较,表明拟议的方法有望在音频和像素视觉部分之间搭建桥梁。《守则》可在 httpss/AVGIAB/SOSPLUPR. http://SOSPRAB/SOSPLUB.

0

相关内容

AVS

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

T细胞Sema4D表达在HBV慢性持续感染中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

ECoG,EEG-fMRI多模态癫痫监测与病灶定位研究

国家自然科学基金

0+阅读 · 2014年12月31日

EGCG通过Notch调节炎症的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

准二维空间内电化学组装Bi1-xSbx有序纳米结构及性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ba0.9Co0.7Fe0.2Nb0.1O3-δ阴极耐侵蚀性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

复合势垒层多铁隧道结中磁电耦合机制及界面调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向Notch-1的miRNA在浸润性膀胱癌中的功能及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

小檗碱对糖尿病肾病中G蛋白偶联受体激酶调控系膜细胞G蛋白偶联信号的作用

国家自然科学基金

0+阅读 · 2010年12月31日

纳米生物玻璃对丝素蛋白结构调控、矿化及性能的研究

国家自然科学基金

0+阅读 · 2009年12月31日

整合素β#20449;号通路在非小细胞肺癌EGFR TKI耐药中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

Arxiv

0+阅读 · 2022年9月2日

Visual Prompting via Image Inpainting

Arxiv

1+阅读 · 2022年9月1日

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Arxiv

0+阅读 · 2022年9月1日

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

Arxiv

0+阅读 · 2022年9月1日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Image Segmentation Using Deep Learning: A Survey

Arxiv

17+阅读 · 2020年11月15日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

Arxiv

0+阅读 · 2022年9月2日

Visual Prompting via Image Inpainting

Arxiv

1+阅读 · 2022年9月1日

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Arxiv

0+阅读 · 2022年9月1日

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

Arxiv

0+阅读 · 2022年9月1日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Image Segmentation Using Deep Learning: A Survey

Arxiv

17+阅读 · 2020年11月15日

相关基金

T细胞Sema4D表达在HBV慢性持续感染中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

ECoG,EEG-fMRI多模态癫痫监测与病灶定位研究

国家自然科学基金

0+阅读 · 2014年12月31日

EGCG通过Notch调节炎症的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

准二维空间内电化学组装Bi1-xSbx有序纳米结构及性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ba0.9Co0.7Fe0.2Nb0.1O3-δ阴极耐侵蚀性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

复合势垒层多铁隧道结中磁电耦合机制及界面调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向Notch-1的miRNA在浸润性膀胱癌中的功能及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

小檗碱对糖尿病肾病中G蛋白偶联受体激酶调控系膜细胞G蛋白偶联信号的作用

国家自然科学基金

0+阅读 · 2010年12月31日

纳米生物玻璃对丝素蛋白结构调控、矿化及性能的研究

国家自然科学基金

0+阅读 · 2009年12月31日

整合素β#20449;号通路在非小细胞肺癌EGFR TKI耐药中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员