This paper introduces an online speaker diarization system that can handle long-time audio with low latency. First, a new variant of agglomerative hierarchy clustering is built to cluster the speakers in an online fashion. Then, a speaker embedding graph is proposed. We use this graph to exploit a graph-based reclustering method to further improve the performance. Finally, a label matching algorithm is introduced to generate consistent speaker labels, and we evaluate our system on both DIHARD3 and VoxConverse datasets, which contain long audios with various kinds of scenarios. The experimental results show that our online diarization system outperforms the baseline offline system and has comparable performance to our offline system.
翻译:本文介绍一个可以使用低延迟时间处理长时音频的在线扬声器diarization系统。 首先,建立一个新的组合式等级分组变体, 以在线方式对发言者进行分组。 然后, 提议用一个发言者嵌入图。 我们用这个图表来利用基于图形的重新组合方法来进一步改进性能。 最后, 引入标签匹配算法来生成一致的扬声器标签, 我们用DIHARD3 和 VoxConverse 数据集来评估我们的系统, 后者包含各种情景的长音频。 实验结果显示, 我们的在线diarization系统超过了基线离线系统, 并且与我们的离线系统相似。