端到端神经分裂:从变异器到变异器 (End-to-end Neural Diarization: From Transformer to Conformer) - 专知论文

会员服务 ·

0

Conformer · 子采样层 · 端到端 · 变换 · 子采样 ·

2021 年 6 月 14 日

End-to-end Neural Diarization: From Transformer to Conformer

翻译：端到端神经分裂:从变异器到变异器

Yi Chieh Liu,Eunjung Han,Chul Lee,Andreas Stolcke

from arxiv, To appear in Interspeech 2021

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech. We first show that data augmentation and convolutional subsampling layers enhance the original self-attentive EEND in the Transformer-based EEND, and then Conformer gives an additional gain over the Transformer-based EEND. However, we notice that the Conformer-based EEND does not generalize as well from simulated to real conversation data as the Transformer-based model. This leads us to quantify the mismatch between simulated data and real speaker behavior in terms of temporal statistics reflecting turn-taking between speakers, and investigate its correlation with diarization error. By mixing simulated and real data in EEND training, we mitigate the mismatch further, with Conformer-based EEND achieving 24% error reduction over the baseline SA-EEND system, and 10% improvement over the best augmented Transformer-based system, on two-speaker CALLHOME data.

翻译：我们提出一个新的端到端神经二亚化系统(EEND),这个系统以Confront为基础,这是最近提出的神经结构,将进化图象和变异器结合起来,以模拟当地和全球的言论依赖性。我们首先显示,数据增强和进化子抽样层加强了以变异器为基础的EEND的原始自我强化 EEND,然后Confor给以变异器为基础的EEND带来额外收益。然而,我们注意到,基于前导的EEND没有将模拟的数据与以变异器为基础的真实对话数据进行概括化,也没有将模拟数据与真实的演讲者行为之间的不匹配量化,并调查其与变异性误差之间的关系。通过在以变器为基础的 EEND 培训中将模拟数据与真实数据相结合,我们进一步减轻不匹配,而以变异器为基础的EEND在以二位音频的AYEME数据上将基准系统减少24%的误差,对以最佳变异器为基础的系统改进10%。

0

相关内容

Conformer

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【文本摘要】Text Summarization文本摘要与注意力机制

【文本摘要】Text Summarization文本摘要与注意力机制

深度学习自然语言处理

9+阅读 · 2020年3月15日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

一种基于Sequence-to-Sequence的高质量对话生成方法

一种基于Sequence-to-Sequence的高质量对话生成方法

科技创新与创业

9+阅读 · 2017年11月13日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Are Neural Ranking Models Robust?

Arxiv

0+阅读 · 2021年8月11日

Transformer based Grapheme-to-Phoneme Conversion

Arxiv

6+阅读 · 2020年4月14日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

Text Generation with Exemplar-based Adaptive Decoding

Arxiv

4+阅读 · 2019年4月9日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Arxiv

7+阅读 · 2018年12月3日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Graph Attention Networks

Arxiv

10+阅读 · 2018年2月4日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】稳定电影度量：面向专业视频生成的结构化分类与评测体系

战场AI决策支持系统

【博士论文】面向排序与扩散模型的安全、高效与鲁棒强化学习

面向 AI 生成图像的安全与鲁棒水印：全面综述

相关资讯

【文本摘要】Text Summarization文本摘要与注意力机制

【文本摘要】Text Summarization文本摘要与注意力机制

深度学习自然语言处理

9+阅读 · 2020年3月15日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

一种基于Sequence-to-Sequence的高质量对话生成方法

一种基于Sequence-to-Sequence的高质量对话生成方法

科技创新与创业

9+阅读 · 2017年11月13日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Are Neural Ranking Models Robust?

Arxiv

0+阅读 · 2021年8月11日

Transformer based Grapheme-to-Phoneme Conversion

Arxiv

6+阅读 · 2020年4月14日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

Text Generation with Exemplar-based Adaptive Decoding

Arxiv

4+阅读 · 2019年4月9日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Arxiv

7+阅读 · 2018年12月3日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Graph Attention Networks

Arxiv

10+阅读 · 2018年2月4日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员