利用多模式自我监督的多模式内嵌入器,加强和分离视听语音增强和分离 (Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings) - 专知论文

会员服务 ·

0

语音增强 · 分离的 · MoDELS · state-of-the-art · 自动语音识别 ·

2022 年 10 月 31 日

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings

翻译：利用多模式自我监督的多模式内嵌入器,加强和分离视听语音增强和分离

I-Chun Chern,Kuo-Hsuan Hung,Yi-Ting Chen,Tassadaq Hussain,Mandar Gogate,Amir Hussain,Yu Tsao,Jen-Cheng Hou

from arxiv, Under peer review

AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HUBERT can be generalized to audio-visual regression tasks.

翻译：AV-HuBERT是一种多式自我监督的学习模式,已证明对自动语音识别和唇读等绝对问题十分有效,这意味着可以通过多式自我监督嵌入器获得有用的视听演讲演示,然而,尚不清楚这种演示能否被普遍化,以解决现实世界多式AV回归任务,如视听语音增强和视听语音分离。在这项研究中,我们利用了AVSE和AVSS的SE模块之后经过预先培训的AV-HuBERT模型。比较实验结果表明,我们提议的模型比AVSE和传统的只听觉SE模型表现得更好。总之,我们的结果证实了我们提议的AVSS任务模式的有效性,它有适当的微调战略,表明从AV-HUBERT获得的多式自我监督嵌入器可以被普遍化为视听回归任务。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

离子液体二元溶液的临界现象

国家自然科学基金

0+阅读 · 2013年12月31日

草莓Agamous基因内含子（FvAGI）长非编码RNA的鉴定及其在转FvAGI基因植株中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

连作胁迫下地黄DNA甲基化谱及其与连作障碍的关系

国家自然科学基金

0+阅读 · 2013年12月31日

剪切应力对椎间盘的影响及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

下一代互联网DDoS防御关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

真核基因转录后调控过程相关蛋白质及其复合物的结构生物学研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Clover: Towards A Unified Video-Language Alignment and Fusion Model

Clover: Towards A Unified Video-Language Alignment and Fusion Model

Arxiv

0+阅读 · 2022年12月20日

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

Arxiv

0+阅读 · 2022年12月20日

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Arxiv

0+阅读 · 2022年12月20日

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

Arxiv

0+阅读 · 2022年12月20日

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Arxiv

0+阅读 · 2022年12月19日

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

Arxiv

0+阅读 · 2022年12月19日

Towards Feature Distribution Alignment and Diversity Enhancement for Data-Free Quantization

Arxiv

0+阅读 · 2022年12月19日

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Arxiv

0+阅读 · 2022年12月16日

Improving self-supervised representation learning via sequential adversarial masking

Arxiv

0+阅读 · 2022年12月16日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

state-of-the-art

自动语音识别

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Clover: Towards A Unified Video-Language Alignment and Fusion Model

Clover: Towards A Unified Video-Language Alignment and Fusion Model

Arxiv

0+阅读 · 2022年12月20日

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

Arxiv

0+阅读 · 2022年12月20日

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Arxiv

0+阅读 · 2022年12月20日

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

Arxiv

0+阅读 · 2022年12月20日

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Arxiv

0+阅读 · 2022年12月19日

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

Arxiv

0+阅读 · 2022年12月19日

Towards Feature Distribution Alignment and Diversity Enhancement for Data-Free Quantization

Arxiv

0+阅读 · 2022年12月19日

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Arxiv

0+阅读 · 2022年12月16日

Improving self-supervised representation learning via sequential adversarial masking

Arxiv

0+阅读 · 2022年12月16日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

离子液体二元溶液的临界现象

国家自然科学基金

0+阅读 · 2013年12月31日

草莓Agamous基因内含子（FvAGI）长非编码RNA的鉴定及其在转FvAGI基因植株中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

连作胁迫下地黄DNA甲基化谱及其与连作障碍的关系

国家自然科学基金

0+阅读 · 2013年12月31日

剪切应力对椎间盘的影响及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

下一代互联网DDoS防御关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

真核基因转录后调控过程相关蛋白质及其复合物的结构生物学研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员