提高最终至最终神经分裂模拟对话的自然性 (Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization)

This paper investigates a method for simulating natural conversation in the model training of end-to-end neural diarization (EEND). Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset. Simulated datasets play an essential role in the training of EEND, but as yet there has been insufficient investigation into an optimal simulation method. We thus propose a method to simulate natural conversational speech. In contrast to conventional methods, which simply combine the speech of multiple speakers, our method takes turn-taking into account. We define four types of speaker transition and sequentially arrange them to simulate natural conversations. The dataset simulated using our method was found to be statistically similar to the real dataset in terms of the silence and overlap ratios. The experimental results on two-speaker diarization using the CALLHOME and CSJ datasets showed that the simulated dataset contributes to improving the performance of EEND.

翻译：本文调查了在终端到终端神经二极化(END)示范培训中模拟自然对话的方法。由于缺乏任何附加说明的真实对话数据集, EEND通常先在大规模模拟对话数据集上接受先期训练,然后适应目标真实数据集。模拟数据集在EEND的培训中起着重要作用, 但是还没有对最佳模拟方法进行充分调查。因此,我们提出了一个模拟自然对话演讲的方法。与只是将多位演讲者的发言合并起来的传统方法不同, 我们的方法考虑到了交替考虑。我们定义了四种演讲者转换类型并按顺序安排模拟自然对话。使用我们的方法模拟的数据集在统计上与沉默和重叠比率方面的真实数据集相似。使用 CAPHOME 和 CSJ 数据集的双声调对称调实验结果显示, 模拟数据集有助于改善 EEND的性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日