Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR - 专知论文

会员服务 ·

0

语音识别 · 稳健性 · SSL · 表示 · 原点 ·

2023 年 4 月 23 日

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

翻译：暂无翻译

Yuchen Hu,Chen Chen,Qiushi Zhu,Eng Siong Chng

from arxiv, 12 pages, 7 figures, Submitted to IEEE/ACM TASLP

Automatic speech recognition (ASR) has gained a remarkable success thanks to recent advances of deep learning, but it usually degrades significantly under real-world noisy conditions. Recent works introduce speech enhancement (SE) as front-end to improve speech quality, which is proved effective but may not be optimal for downstream ASR due to speech distortion problem. Based on that, latest works combine SE and currently popular self-supervised learning (SSL) to alleviate distortion and improve noise robustness. Despite the effectiveness, the speech distortion caused by conventional SE still cannot be completely eliminated. In this paper, we propose a self-supervised framework named Wav2code to implement a generalized SE without distortions for noise-robust ASR. First, in pre-training stage the clean speech representations from SSL model are sent to lookup a discrete codebook via nearest-neighbor feature matching, the resulted code sequence are then exploited to reconstruct the original clean representations, in order to store them in codebook as prior. Second, during finetuning we propose a Transformer-based code predictor to accurately predict clean codes by modeling the global dependency of input noisy representations, which enables discovery and restoration of high-quality clean representations without distortions. Furthermore, we propose an interactive feature fusion network to combine original noisy and the restored clean representations to consider both fidelity and quality, resulting in even more informative features for downstream ASR. Finally, experiments on both synthetic and real noisy datasets demonstrate that Wav2code can solve the speech distortion and improve ASR performance under various noisy conditions, resulting in stronger robustness.

翻译：暂无翻译

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

专知会员服务

17+阅读 · 2022年3月28日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

基于全基因组测序基础上的芝麻茎点枯病菌（Macrophomina phaseolina）致病相关基因分析

国家自然科学基金

0+阅读 · 2013年12月31日

正交分光成像大范围快速高精度位姿测量新方法

国家自然科学基金

0+阅读 · 2012年12月31日

轮式倒立摆的非线性建模与动力学分析

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

紫外-可见-近红外全波段抗反射三维微纳结构的构筑

国家自然科学基金

0+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

硅基薄膜太阳能电池的纳米仿生陷光结构

国家自然科学基金

0+阅读 · 2012年12月31日

亚像元制图及精度评价

国家自然科学基金

0+阅读 · 2009年12月31日

n型重掺铝氧化锌半导体纳米线近红外表面等离子体效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Mixed Autoencoder for Self-supervised Visual Representation Learning

Arxiv

0+阅读 · 2023年6月7日

Looking and Listening: Audio Guided Text Recognition

Arxiv

1+阅读 · 2023年6月6日

CONCORD: Clone-aware Contrastive Learning for Source Code

Arxiv

0+阅读 · 2023年6月5日

Zero shot framework for satellite image restoration

Arxiv

0+阅读 · 2023年6月5日

Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization

Arxiv

0+阅读 · 2023年6月5日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

专知会员服务

17+阅读 · 2022年3月28日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

《理解城市战及其在俄乌战争中的表现》报告

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

相关论文

Mixed Autoencoder for Self-supervised Visual Representation Learning

Arxiv

0+阅读 · 2023年6月7日

Looking and Listening: Audio Guided Text Recognition

Arxiv

1+阅读 · 2023年6月6日

CONCORD: Clone-aware Contrastive Learning for Source Code

Arxiv

0+阅读 · 2023年6月5日

Zero shot framework for satellite image restoration

Arxiv

0+阅读 · 2023年6月5日

Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization

Arxiv

0+阅读 · 2023年6月5日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

相关基金

基于全基因组测序基础上的芝麻茎点枯病菌（Macrophomina phaseolina）致病相关基因分析

国家自然科学基金

0+阅读 · 2013年12月31日

正交分光成像大范围快速高精度位姿测量新方法

国家自然科学基金

0+阅读 · 2012年12月31日

轮式倒立摆的非线性建模与动力学分析

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

紫外-可见-近红外全波段抗反射三维微纳结构的构筑

国家自然科学基金

0+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

硅基薄膜太阳能电池的纳米仿生陷光结构

国家自然科学基金

0+阅读 · 2012年12月31日

亚像元制图及精度评价

国家自然科学基金

0+阅读 · 2009年12月31日

n型重掺铝氧化锌半导体纳米线近红外表面等离子体效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员