与议长追踪缓冲器在线端至端神经分化 (Online End-to-End Neural Diarization with Speaker-Tracing Buffer) - 专知论文

会员服务 ·

0

Buffer（公司） · 在线 · 端到端 · INFORMS · 相关系数 ·

2021 年 3 月 7 日

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

翻译：与议长追踪缓冲器在线端至端神经分化

Yawen Xue,Shota Horiguchi,Yusuke Fujita,Shinji Watanabe,Kenji Nagamatsu

from arxiv, Accepted to SLT 2021

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk-size, achieved DERs of 12.54% for CALLHOME and 20.77% for CSJ with 1.4s actual latency.

翻译：本文基于一个完全监督的自我注意机制(SA-END),提出了一个新的在线演讲者分解算法。在线对称必然呈现出一个演讲者的分解问题, 原因是在记录中可能错误地分配发言区域。为避免这种不一致, 我们提议了一个语音追踪缓冲机制, 选择代表发言者先前各块的变换信息的若干输入框, 并将其储存在缓冲中。这些缓冲框与当前块中的输入框堆叠在一起, 并输入到一个自控网络中。我们的方法通过检查缓冲和当前块之间的关联性, 来确保整个缓冲和当前块的分解输出一致。此外, 我们用不同的块大小对 SA- EEND 进行了培训, 以缓解发言者缓冲机制引入的培训和推论之间的不匹配。实验结果, 包括在线SA- EEND 和可变块大小, ACHOME实现了12.54%的D, CSJ为20.77%的D, 实际拉特。

0

相关内容

Buffer（公司）

Buffer（公司）

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

专知会员服务

8+阅读 · 2019年12月3日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Nature 一周论文导读 | 2019 年 2 月 21 日

Nature 一周论文导读 | 2019 年 2 月 21 日

科研圈

14+阅读 · 2019年3月3日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Arxiv

0+阅读 · 2021年4月29日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

Buffer（公司）

相关VIP内容

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

专知会员服务

8+阅读 · 2019年12月3日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Nature 一周论文导读 | 2019 年 2 月 21 日

Nature 一周论文导读 | 2019 年 2 月 21 日

科研圈

14+阅读 · 2019年3月3日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Arxiv

0+阅读 · 2021年4月29日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员