可配置Transformer Transducer语音识别的可变关注掩码 (Variable Attention Masking for Configurable Transformer Transducer Speech Recognition) - 专知论文

会员服务 ·

0

掩码 · 识别 · 语音识别 · 变换 · 准确率 ·

2023 年 4 月 18 日

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

翻译：可配置Transformer Transducer语音识别的可变关注掩码

Pawel Swietojanski,Stefan Braun,Dogan Can,Thiago Fraga da Silva,Arnab Ghoshal,Takaaki Hori,Roger Hsiao,Henry Mason,Erik McDermott,Honza Silovsky,Ruchir Travadi,Xiaodan Zhuang

from arxiv, To appear in ICASSP 2023

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.

翻译：本文研究了在Transformer Transducer语音识别中使用关注掩码构建单个可配置模型以满足不同部署场景的需求。我们展示了一组全面的实验，比较了固定掩码和分块掩码两种方法在识别准确率和延迟方面的差异。同时，我们还探讨了如何使用可变掩码进行实验，即在训练时从目标分布中采样Attention masks，以构建能够应对不同配置的模型。最后，我们还研究了如何使用单个可配置模型来执行流式识别和声学重新打分。实验结果表明，相较于固定掩码，在FastEmit上采用分块掩码能够更好地平衡准确率和延迟之间的关系，并且在声学重新打分场景下，可变掩码能够使准确率相对提高8％。

0

相关内容

【CVPR2023】Mask3D:通过学习掩码3D先验对2D视觉transformer进行预训练

【CVPR2023】Mask3D:通过学习掩码3D先验对2D视觉transformer进行预训练

专知会员服务

24+阅读 · 2023年4月9日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

专知会员服务

68+阅读 · 2019年6月11日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于发音特征的汉语语音识别分层解码方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

整装式纤维@ZSM-5核-壳结构催化剂的‘Top-Down’设计合成及其甲醇制丙烯选择性调控的构效研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于无框架网络架构的5G无线组网架构及组网策略研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于语谱图信息的汉语词汇整体识别和语音增强方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于音节模型的音频点播关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高质量机动目标InISAR三维成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

基于多尺度leaders多重分形与多尺度约束PCA的汽车起重机主泵特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data

Arxiv

0+阅读 · 2023年6月2日

Elixir: Train a Large Language Model on a Small GPU Cluster

Arxiv

0+阅读 · 2023年5月31日

TypeFormer: Transformers for Mobile Keystroke Biometrics

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】Mask3D:通过学习掩码3D先验对2D视觉transformer进行预训练

【CVPR2023】Mask3D:通过学习掩码3D先验对2D视觉transformer进行预训练

专知会员服务

24+阅读 · 2023年4月9日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

专知会员服务

68+阅读 · 2019年6月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data

Arxiv

0+阅读 · 2023年6月2日

Elixir: Train a Large Language Model on a Small GPU Cluster

Arxiv

0+阅读 · 2023年5月31日

TypeFormer: Transformers for Mobile Keystroke Biometrics

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

相关基金

基于发音特征的汉语语音识别分层解码方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

整装式纤维@ZSM-5核-壳结构催化剂的‘Top-Down’设计合成及其甲醇制丙烯选择性调控的构效研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于无框架网络架构的5G无线组网架构及组网策略研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于语谱图信息的汉语词汇整体识别和语音增强方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于音节模型的音频点播关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高质量机动目标InISAR三维成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

基于多尺度leaders多重分形与多尺度约束PCA的汽车起重机主泵特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员