与LSTM的音响 (Speaker Diarization with LSTM) - 专知论文

会员服务 ·

0

长短期记忆网络 · state-of-the-art · 错误率 · 簇 · Performance ·

2022 年 1 月 23 日

Speaker Diarization with LSTM

翻译：与LSTM的音响

Quan Wang,Carlton Downey,Li Wan,Philip Andrew Mansfield,Ignacio Lopez Moreno

from arxiv, Published at ICASSP 2018

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization. Specifically, we combine LSTM-based d-vector audio embeddings with recent work in non-parametric clustering to obtain a state-of-the-art speaker diarization system. Our system is evaluated on three standard public datasets, suggesting that d-vector based diarization systems offer significant advantages over traditional i-vector based systems. We achieved a 12.0% diarization error rate on NIST SRE 2000 CALLHOME, while our model is trained with out-of-domain data from voice search logs.

翻译：多年来,基于 i-Victor 的音频嵌入技术是发言者校验和音频dial化应用的主要方法。然而,随着不同领域深层学习的兴起,基于神经网络的音频嵌入技术(又称D-Victors)始终展示出高音频的校验性能。在本文中,我们利用基于 d-Victor 的音频验证系统的成功开发出一个新的基于 d-Victor 的音频分化方法。具体地说,我们把基于 LSTM 的 d-Victor 的音频嵌入与最近在非参数组合中的工作结合起来,以获得一个最新水平的音频diar化系统。我们用三个标准公共数据集对我们的系统进行了评估,这表明基于 d-Victor 的对立系统比基于传统 i-Victor 的系统有很大的优势。我们在 NIST SRE 2000 ACEHOME 上实现了12.0 dicalation错误率的12.0, 而我们的模型则通过语音搜索日志的外部数据来培训。

1

相关内容

长短期记忆网络

长短期记忆网络

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

专知会员服务

68+阅读 · 2019年6月11日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

自适应多分辨率宽带频谱压缩感知

国家自然科学基金

0+阅读 · 2012年12月31日

语音缺失频谱重建及语音频谱二维相关性建模的研究

国家自然科学基金

0+阅读 · 2012年12月31日

随时间变化点云序列的快速几何重建问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于细胞机械/电子特性高通量同时表征的肿瘤细胞检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

非对称Ising类神经网络模型重构的理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于神经-体液调控机制的有机制造系统自适应控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

分布式多源复杂时序数据融合估计研究

国家自然科学基金

1+阅读 · 2009年12月31日

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

Robust End-to-end Speaker Diarization with Generic Neural Clustering

Arxiv

0+阅读 · 2022年4月18日

Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

Arxiv

0+阅读 · 2022年4月16日

Exploiting Multiple EEG Data Domains with Adversarial Learning

Arxiv

0+阅读 · 2022年4月16日

Summarization with Graphical Elements

Summarization with Graphical Elements

Arxiv

0+阅读 · 2022年4月15日

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Arxiv

0+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism

Arxiv

15+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

长短期记忆网络

state-of-the-art

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

【亚马逊网络服务总监Alexander J. Smola报告】深度学习注意力机制-Attention in Deep learning-附101页PPT

专知会员服务

68+阅读 · 2019年6月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

Robust End-to-end Speaker Diarization with Generic Neural Clustering

Arxiv

0+阅读 · 2022年4月18日

Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

Arxiv

0+阅读 · 2022年4月16日

Exploiting Multiple EEG Data Domains with Adversarial Learning

Arxiv

0+阅读 · 2022年4月16日

Summarization with Graphical Elements

Summarization with Graphical Elements

Arxiv

0+阅读 · 2022年4月15日

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Arxiv

0+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism

Arxiv

15+阅读 · 2018年1月5日

相关基金

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

自适应多分辨率宽带频谱压缩感知

国家自然科学基金

0+阅读 · 2012年12月31日

语音缺失频谱重建及语音频谱二维相关性建模的研究

国家自然科学基金

0+阅读 · 2012年12月31日

随时间变化点云序列的快速几何重建问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于细胞机械/电子特性高通量同时表征的肿瘤细胞检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

非对称Ising类神经网络模型重构的理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于神经-体液调控机制的有机制造系统自适应控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

分布式多源复杂时序数据融合估计研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员