This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge.
翻译:本文详细介绍了提交第三次DIHARD言语分化挑战的Hitachi-JHU系统。该系统输出五个子系统的共合结果:两个以X为主的子系统、两个以端到端神经二分化为基础的子系统和一个混合子系统。我们完善每个系统和所有五个子系统,使其具有竞争力和互补性。在DOVER-Lap系统组合之后,它实现了第1轨全核心的偏差率11.58%和14.09%,以及第2轨全核心的偏差率16.94%和20.01%。有了这些结果,我们赢得了全部挑战任务的第二位。