时间领域分散的超额语音培训:联合学习目标语音隔离和个人自愿、残疾、残疾、残疾和个人福利 (Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits) - 专知论文

会员服务 ·

0

Weight · 分离的 · 稀疏 · 可约的 · Processing（编程语言） ·

2021 年 9 月 27 日

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

翻译：时间领域分散的超额语音培训:联合学习目标语音隔离和个人自愿、残疾、残疾、残疾和个人福利

Qingjian Lin,Lin Yang,Xuyang Wang,Luyuan Xie,Chen Jia,Junjie Wang

from arxiv, Accepted by APSIPA 2021

Target speech separation is the process of filtering a certain speaker's voice out of speech mixtures according to the additional speaker identity information provided. Recent works have made considerable improvement by processing signals in the time domain directly. The majority of them take fully overlapped speech mixtures for training. However, since most real-life conversations occur randomly and are sparsely overlapped, we argue that training with different overlap ratio data benefits. To do so, an unavoidable problem is that the popularly used SI-SNR loss has no definition for silent sources. This paper proposes the weighted SI-SNR loss, together with the joint learning of target speech separation and personal VAD. The weighted SI-SNR loss imposes a weight factor that is proportional to the target speaker's duration and returns zero when the target speaker is absent. Meanwhile, the personal VAD generates masks and sets non-target speech to silence. Experiments show that our proposed method outperforms the baseline by 1.73 dB in terms of SDR on fully overlapped speech, as well as by 4.17 dB and 0.9 dB on sparsely overlapped speech of clean and noisy conditions. Besides, with slight degradation in performance, our model could reduce the time costs in inference.

翻译：目标语音分离是根据额外发言者身份信息,将某一发言者的声音从语音混合物中过滤出的过程。最近的工作通过直接处理时间范围内的信号而大大改进了时间范围内的信号,其中多数采用完全重叠的语音混合物来进行培训。然而,由于大多数真实生活中的对话是随机发生的,而且很少重叠,因此我们认为,培训与不同重叠比例数据的好处不同。为此,一个不可避免的问题是,普遍使用的SI-SNR损失对无声源没有定义。本文提议加权的SI-SNR损失,同时共同学习目标语音分离和个人VAD。加权的SI-SNR损失要求一个与目标发言者的时间长度成正比的重量系数,在目标发言者缺席时返回零。与此同时,个人VAD产生面具,设定非目标发言为沉默。实验表明,我们所提议的方法在完全重叠的演讲中比标准特别提款权的基线高出1.7 dB,以及4.17 dB和0.9 dB对清洁和噪音条件的微重的演讲比重。此外,由于性能轻微退化,我们的模型可以降低时间成本。

0

相关内容

Weight

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【CIKM2020】持续域自适应的机器阅读理解，Continual Domain Adaptation

【CIKM2020】持续域自适应的机器阅读理解，Continual Domain Adaptation

专知会员服务

12+阅读 · 2020年8月26日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

迁移学习之Domain Adaptation

迁移学习之Domain Adaptation

全球人工智能

18+阅读 · 2018年4月11日

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月17日

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月16日

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

Arxiv

0+阅读 · 2021年11月16日

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Arxiv

0+阅读 · 2021年11月15日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Advances in Online Audio-Visual Meeting Transcription

Advances in Online Audio-Visual Meeting Transcription

Arxiv

4+阅读 · 2019年12月10日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【CIKM2020】持续域自适应的机器阅读理解，Continual Domain Adaptation

【CIKM2020】持续域自适应的机器阅读理解，Continual Domain Adaptation

专知会员服务

12+阅读 · 2020年8月26日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

美军小型无人机项目

无人机蜂群——作为执行非常规战争的创新工具 | 2025最新文献

不确定环境下无人机与无人地面车辆编队的地下勘探规划算法 | 122页

接纳无人机多样性：西方军事在无人机战争中适应的五个挑战 | 28页报告

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

迁移学习之Domain Adaptation

迁移学习之Domain Adaptation

全球人工智能

18+阅读 · 2018年4月11日

相关论文

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月17日

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月16日

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

Arxiv

0+阅读 · 2021年11月16日

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Arxiv

0+阅读 · 2021年11月15日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Advances in Online Audio-Visual Meeting Transcription

Advances in Online Audio-Visual Meeting Transcription

Arxiv

4+阅读 · 2019年12月10日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

微信扫码咨询专知VIP会员