多模式多校校学习促进视听语言隔离 (Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation) - 专知论文

会员服务 ·

0

Learning · 相关系数 · 分离的 · contrastive · Extensibility ·

2022 年 7 月 4 日

Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation

翻译：多模式多校校学习促进视听语言隔离

Xiaoyu Wang,Xiangyu Kong,Xiulian Peng,Yan Lu

from arxiv, 5 pages, accepted by interspeech2022

In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation. Although previous efforts have been extensively put on combining audio and visual modalities, most of them solely adopt a straightforward concatenation of audio and visual features. To exploit the real useful information behind these two modalities, we define two key correlations which are: (1) identity correlation (between timbre and facial attributes); (2) phonetic correlation (between phoneme and lip motion). These two correlations together comprise the complete information, which shows a certain superiority in separating target speaker's voice especially in some hard cases, such as the same gender or similar content. For implementation, contrastive learning or adversarial training approach is applied to maximize these two correlations. Both of them work well, while adversarial training shows its advantage by avoiding some limitations of contrastive learning. Compared with previous research, our solution demonstrates clear improvement on experimental metrics without additional complexity. Further analysis reveals the validity of the proposed architecture and its good potential for future extension.

翻译：在本文中,我们提出了针对视听语言分离任务的多模式多关系学习框架,尽管以前曾广泛努力将视听模式结合起来,但多数只是采用直接的视听特征组合。为了利用这两种模式背后的真正有用信息,我们定义了两个关键关联:(1) 身份相关性(Timbre和面部属性之间);(2) 语音相关性(电话和唇动)。这两种关联共同包括完整的信息,这显示在区分目标演讲者的声音方面有一定的优势,特别是在一些困难的情况下,例如相同的性别或类似的内容。对于实施而言,采用对比式学习或对称式培训方法来尽量扩大这两种关联性。两者都行之有效,而对抗性培训则表明其优势在于避免了一些对比性学习的局限性。与以往的研究相比,我们的解决方案表明实验性指标在不增加复杂性的情况下有了明显改善。进一步的分析揭示了拟议结构的有效性及其今后扩展的良好潜力。

0

相关内容

Learning

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

复杂动态网络系统的辨识方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

HFM1基因变异在卵巢早衰发病中的作用及致病机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

聚合物模板法组装石墨烯三维有序多孔材料

国家自然科学基金

0+阅读 · 2013年12月31日

II/VI族半导体纳米线异质结构的生长机理、载流子分布与输运特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

空气绝缘层CuPc单晶纳米线场效应NO2传感器及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻HL-CMS两个不育基因与两个恢复基因互作机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

SiO2复合材料表面CNTs生长及与TC4钛合金的复合反应钎焊机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

若干宽带隙A2IIIB3VI基半导体的能带结构、微结构与热电特性

国家自然科学基金

0+阅读 · 2011年12月31日

中国沿海城市地区含碳气溶胶的排放和传输

国家自然科学基金

0+阅读 · 2009年12月31日

Long Code for Code Search

Arxiv

0+阅读 · 2022年8月24日

Multilayer deep feature extraction for visual texture recognition

Arxiv

0+阅读 · 2022年8月22日

A Simple Baseline for Multi-Camera 3D Object Detection

Arxiv

0+阅读 · 2022年8月22日

Learning Speaker-specific Lip-to-Speech Generation

Arxiv

0+阅读 · 2022年8月20日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

Long Code for Code Search

Arxiv

0+阅读 · 2022年8月24日

Multilayer deep feature extraction for visual texture recognition

Arxiv

0+阅读 · 2022年8月22日

A Simple Baseline for Multi-Camera 3D Object Detection

Arxiv

0+阅读 · 2022年8月22日

Learning Speaker-specific Lip-to-Speech Generation

Arxiv

0+阅读 · 2022年8月20日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

复杂动态网络系统的辨识方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

HFM1基因变异在卵巢早衰发病中的作用及致病机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

聚合物模板法组装石墨烯三维有序多孔材料

国家自然科学基金

0+阅读 · 2013年12月31日

II/VI族半导体纳米线异质结构的生长机理、载流子分布与输运特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

空气绝缘层CuPc单晶纳米线场效应NO2传感器及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻HL-CMS两个不育基因与两个恢复基因互作机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

SiO2复合材料表面CNTs生长及与TC4钛合金的复合反应钎焊机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

若干宽带隙A2IIIB3VI基半导体的能带结构、微结构与热电特性

国家自然科学基金

0+阅读 · 2011年12月31日

中国沿海城市地区含碳气溶胶的排放和传输

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员