基于多分辨率频率编码器和解码器的时域语音增强 (Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder) - 专知论文

会员服务 ·

0

多分辨率 · 时域 · 语音增强 · 非稳态 · 损失 ·

2023 年 3 月 26 日

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

翻译：基于多分辨率频率编码器和解码器的时域语音增强

Hao Shi,Masato Mimura,Longbiao Wang,Jianwu Dang,Tatsuya Kawahara

Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement.

翻译：最近时间域语音增强（SE）得到了密切研究，其中 DEMUCS 引入多分辨率 STFT 损失以提高性能。然而，用于 STFT 的一些分辨率包含非稳态信号，仅使用一个输出同时学习多分辨率频率损失具有挑战性。为更好地利用多分辨率频率信息，我们在时域编码器中补充多个具有不同帧长度的频谱图。它们提取窄带和宽带的稳态频率信息。我们还采用多个解码器输出，每个输出计算其相应的分辨率频率损失。实验结果表明：（1）在编码器中融合稳态频率特征比非稳态特征更有效，（2）与频率损失一致的多个输出提高了性能。在 Voice-Bank 数据集上的实验表明，所提出的方法获得了 0.14 PESQ 的改进。

1

相关内容

多分辨率

【ICML2021】基于小波变换的图神经网络

专知会员服务

51+阅读 · 2021年5月19日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

从 Encoder 到 Decoder 实现 Seq2Seq 模型

从 Encoder 到 Decoder 实现 Seq2Seq 模型

AI研习社

10+阅读 · 2018年2月10日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

基于Lowrank分解的谱方法和有限差分地震正演模拟

国家自然科学基金

0+阅读 · 2015年12月31日

非对称物理层网络编码机制设计与理论分析

国家自然科学基金

0+阅读 · 2013年12月31日

具有优异部分相关特性的参考信号序列设计及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于稀疏时频分析与二元掩蔽估计的耳语音可懂度增强研究

国家自然科学基金

0+阅读 · 2013年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

外加注入信号的速调型相对论返波管研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于微波光子学信号处理的超快飞秒测距激光雷达

国家自然科学基金

0+阅读 · 2011年12月31日

无线多媒体传感器网络中多媒体信息处理及路由技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation

Arxiv

0+阅读 · 2023年5月17日

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

Arxiv

0+阅读 · 2023年5月17日

Frequency Enhanced Hybrid Attention Network for Sequential Recommendation

Arxiv

0+阅读 · 2023年5月17日

Contrastive Label Enhancement

Arxiv

0+阅读 · 2023年5月16日

MLIC: Multi-Reference Entropy Model for Learned Image Compression

Arxiv

0+阅读 · 2023年5月16日

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Arxiv

0+阅读 · 2023年5月15日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】基于小波变换的图神经网络

专知会员服务

51+阅读 · 2021年5月19日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

未来战场：AI赋能无人作战新范式，39页ppt

【牛津博士论文】无限维空间中的广义变分推断

DeepSeek AI 从入门到付费专家·第一卷：动手实践、真实应用与可扩展 AI 解决方案全掌握

2025中国AI Agent商业应用场景洞察研究

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

从 Encoder 到 Decoder 实现 Seq2Seq 模型

从 Encoder 到 Decoder 实现 Seq2Seq 模型

AI研习社

10+阅读 · 2018年2月10日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation

Arxiv

0+阅读 · 2023年5月17日

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

Arxiv

0+阅读 · 2023年5月17日

Frequency Enhanced Hybrid Attention Network for Sequential Recommendation

Arxiv

0+阅读 · 2023年5月17日

Contrastive Label Enhancement

Arxiv

0+阅读 · 2023年5月16日

MLIC: Multi-Reference Entropy Model for Learned Image Compression

Arxiv

0+阅读 · 2023年5月16日

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Arxiv

0+阅读 · 2023年5月15日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

相关基金

基于Lowrank分解的谱方法和有限差分地震正演模拟

国家自然科学基金

0+阅读 · 2015年12月31日

非对称物理层网络编码机制设计与理论分析

国家自然科学基金

0+阅读 · 2013年12月31日

具有优异部分相关特性的参考信号序列设计及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于稀疏时频分析与二元掩蔽估计的耳语音可懂度增强研究

国家自然科学基金

0+阅读 · 2013年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

外加注入信号的速调型相对论返波管研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于微波光子学信号处理的超快飞秒测距激光雷达

国家自然科学基金

0+阅读 · 2011年12月31日

无线多媒体传感器网络中多媒体信息处理及路由技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员