改进有视听自读功能的开放域域视频在屏幕上的声音隔离 (Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention) - 专知论文

会员服务 ·

0

分离的 · MoDELS · 可辨认的 · state-of-the-art · Performance ·

2021 年 6 月 17 日

Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

翻译：改进有视听自读功能的开放域域视频在屏幕上的声音隔离

Efthymios Tzinis,Scott Wisdom,Tal Remez,John R. Hershey

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audiovisual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio separation model. Our proposed model addresses these issues using cross-modal and self-attention modules that capture audio-visual dependencies at a finer resolution over time, and by unsupervised pre-training of audio separation model. These improvements allow the model to generalize to a much wider set of unseen videos. For evaluation and semi-supervised training, we collected human annotations of on-screen audio from a large database of in-the-wild videos (YFCC100M). Our results show marked improvements in on-screen separation performance, in more general conditions than previous methods.

翻译：我们引入了最先进的视听屏幕声音分离系统,该系统能够通过观看现场视频,学习将声音与屏幕上物体分离,并将声音与屏幕上物体联系起来;我们确定了以往视听屏幕声音分离工作的局限性,包括时空注意力的简单和粗略解析,以及音频分离模式的趋同性差;我们提议的模型利用跨模式和自省模块来解决这些问题,这些模块在时间上以细小的分辨率记录视听依赖性,以及未经监督的音频分离模型培训前,这些改进使得该模型能够推广到范围更广的一组看不见视频;为了评估和半监督培训,我们从大型视频数据库(YFCC100M)收集了屏幕上声音的人文说明;我们的结果显示,与以往相比,屏幕上分离的性能在较一般的条件下明显改进。

0

相关内容

分离的

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【书籍】深度学习框架：PyTorch入门与实践（附代码）

【书籍】深度学习框架：PyTorch入门与实践（附代码）

专知会员服务

167+阅读 · 2019年10月28日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

专知会员服务

29+阅读 · 2019年6月16日

【入门】PyTorch文本分类

【入门】PyTorch文本分类

深度学习自然语言处理

8+阅读 · 2020年2月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 03-19日更新12篇论文及代码汇总（包括5篇oral，1个数据集）

CVPR2019 | 03-19日更新12篇论文及代码汇总（包括5篇oral，1个数据集）

极市平台

3+阅读 · 2019年3月19日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

新智元

6+阅读 · 2018年10月22日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Deeper or Wider Networks of Point Clouds with Self-attention?

Arxiv

0+阅读 · 2021年8月14日

Detection and Captioning with Unseen Object Classes

Arxiv

1+阅读 · 2021年8月13日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Arxiv

6+阅读 · 2018年3月30日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【书籍】深度学习框架：PyTorch入门与实践（附代码）

【书籍】深度学习框架：PyTorch入门与实践（附代码）

专知会员服务

167+阅读 · 2019年10月28日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

专知会员服务

29+阅读 · 2019年6月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【入门】PyTorch文本分类

【入门】PyTorch文本分类

深度学习自然语言处理

8+阅读 · 2020年2月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 03-19日更新12篇论文及代码汇总（包括5篇oral，1个数据集）

CVPR2019 | 03-19日更新12篇论文及代码汇总（包括5篇oral，1个数据集）

极市平台

3+阅读 · 2019年3月19日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

新智元

6+阅读 · 2018年10月22日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Deeper or Wider Networks of Point Clouds with Self-attention?

Arxiv

0+阅读 · 2021年8月14日

Detection and Captioning with Unseen Object Classes

Arxiv

1+阅读 · 2021年8月13日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Arxiv

6+阅读 · 2018年3月30日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员