语音识别端对端多通道变换器 (End-to-End Multi-Channel Transformer for Speech Recognition) - 专知论文

会员服务 ·

0

层 · 注意力机制 · 变换 · Integration · EDA ·

2021 年 2 月 8 日

End-to-End Multi-Channel Transformer for Speech Recognition

翻译：语音识别端对端多通道变换器

Feng-Ju Chang,Martin Radfar,Athanasios Mouchtaris,Brian King,Siegfried Kunzmann

from arxiv, Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA). The CSA and CCA layers encode the contextual relationship within and between channels and across time, respectively. The channel-attended outputs from CSA and CCA are then fed into the EDA layers to help decode the next token given the preceding ones. The experiments show that in a far-field in-house dataset, our method outperforms the baseline single-channel transformer, as well as the super-directive and neural beamformers cascaded with the transformers.

翻译：在本文中,我们利用神经变压器结构将神经变压器结构用于多通道语音识别系统,从不同麦克风收集的光谱和空间信息使用注意层集成。我们的多通道变压器网络主要由三个部分组成:频道自关注层(CSA)、跨通道关注层(CCA)和多通道解码器注意层(EDA)。CSA和CCA层分别对各频道内部和之间以及不同时间的背景关系进行了编码。CSA和CCA的频道访问输出随后被输入EDA层,以帮助解码前面的下一个标记。实验显示,在远方的内部数据集中,我们的方法超越了与变压器相联的基线单通道变压器,以及超导式和神经变压器。

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

已删除

将门创投

3+阅读 · 2019年6月12日

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Arxiv

0+阅读 · 2021年4月2日

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Arxiv

0+阅读 · 2021年4月1日

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Arxiv

0+阅读 · 2021年3月31日

ACTION-Net: Multipath Excitation for Action Recognition

Arxiv

3+阅读 · 2021年3月11日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Arxiv

10+阅读 · 2020年12月31日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】面向时间序列基础模型的合成序列符号数据生成方法

军事通信市场七大趋势概述

【CMU博士论文】深度学习中泛化的量化、理解与改进

面向低光照图像增强的扩散模型

相关资讯

已删除

将门创投

3+阅读 · 2019年6月12日

相关论文

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Arxiv

0+阅读 · 2021年4月2日

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Arxiv

0+阅读 · 2021年4月1日

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Arxiv

0+阅读 · 2021年3月31日

ACTION-Net: Multipath Excitation for Action Recognition

Arxiv

3+阅读 · 2021年3月11日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Arxiv

10+阅读 · 2020年12月31日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

微信扫码咨询专知VIP会员