In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
翻译:在本文中,我们为终端到终端神经演讲者对齐提供了一种有条件的多任务学习方法(END)。EEND系统与传统的基于集群的方法相比,表现良好,特别是在重复发言的情况下。在本文中,为了进一步改善EEND系统的业绩,我们提议了一个新型的多任务学习框架,在明确考虑任务依赖性的同时,解决发言者对齐和理想子任务。我们优化了以语音活动和重叠探测为条件的语音对齐,这些是语言对齐的子任务,是基于概率链规则。实验结果显示,我们提出的方法能够利用子任务有效地模拟演讲者对齐,并在对齐误率方面优于传统的 EEND系统。