在混合发言投入下对基于关注的神经空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、水 (The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input) - 专知论文

会员服务 ·

0

音素 · Performer · 语音识别 · MoDELS · 错误率 ·

2021 年 8 月 3 日

The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input

翻译：在混合发言投入下对基于关注的神经空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、空气、水

Bradley He,Martin Radfar

from arxiv, 5 pages, 3 figures

In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS), at different target-to-interference ratio (TIR) and measure the phoneme error rate. In particular, we investigate in details when two phonemes are mixed what will be the predicted phoneme; in this fashion we build a model in which the most probable predictions for a phoneme are given. We found a 65% relative increase in PER when LAS was presented with mixed speech signals at TIR = 0 dB and the performance approaches the unmixed scenario at TIR = 30 dB. Our results show the model, when presented with mixed phonemes signals, tend to predict those that have higher accuracies during evaluation of original phoneme signals.

翻译：为了评估以注意力为基础的神经ASR在噪音条件下的性能,目前的趋势是向模型提供各种吵闹的语音数据小时,并测量整体单词/电话错误率(W/PER)。一般而言,这些模型在暴露在两个或两个以上发言者活跃的鸡尾酒派对设置中时如何运行。在本文中,我们向以关注为基础的神经ASR(称为听、听和Spell(LAS)),以不同的目标对干扰比率(TIR)和计量电话错误率。特别是,我们详细调查两种电话混合了预测的电话错误率(W/PER)的情况;我们以这种方式建立一个模型来提供最有可能的电话预测。我们发现,在TIR=0 dB时,LAS的语音信号混杂,其性能接近TIR=30 dB时,PER的语音信号会增加65%。我们的结果显示模型,在显示混合电话信号时,我们往往预测在原始电话信号评价中具有更高理解力的模型。

0

相关内容

微软《神经语音合成》综述论文，63页pdf530篇文献

微软《神经语音合成》综述论文，63页pdf530篇文献

专知会员服务

30+阅读 · 2021年7月3日

【2021干货书】Python可解释人工智能，207页pdf，Explainable AI with Python

【2021干货书】Python可解释人工智能，207页pdf，Explainable AI with Python

专知会员服务

186+阅读 · 2021年5月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

Neural Named Entity Recognition for Kazakh

Arxiv

0+阅读 · 2021年10月4日

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Arxiv

1+阅读 · 2021年10月1日

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月1日

Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims

Arxiv

0+阅读 · 2021年9月25日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

微软《神经语音合成》综述论文，63页pdf530篇文献

微软《神经语音合成》综述论文，63页pdf530篇文献

专知会员服务

30+阅读 · 2021年7月3日

【2021干货书】Python可解释人工智能，207页pdf，Explainable AI with Python

【2021干货书】Python可解释人工智能，207页pdf，Explainable AI with Python

专知会员服务

186+阅读 · 2021年5月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Neural Named Entity Recognition for Kazakh

Arxiv

0+阅读 · 2021年10月4日

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Arxiv

1+阅读 · 2021年10月1日

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月1日

Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims

Arxiv

0+阅读 · 2021年9月25日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员