DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain. Speaker diarization was evaluated under two speech activity conditions (diarization from a reference speech activity vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including read audio-books, meeting speech, clinical interviews, web videos, and, for the first time, conversational telephone speech. A total of 30 organizations (forming 21teams) from industry and academia submitted 499 valid system outputs. The evaluation results indicate that speaker diarization has improved markedly since DIHARD I, particularly for two-party interactions, but that for many domains (e.g., web video) the problem remains far from solved.
翻译:发言人的二分化是根据两种演讲活动条件(参考演讲活动对从零到零的分化的分化)和11个不同的领域进行评估的,其范围包括一系列记录条件和互动类型,包括阅读音频书籍、会议演讲、临床访谈、网络视频,以及首次的谈话电话演讲,共有30个行业和学术界组织(21teams)提交了499项有效的系统产出,评价结果表明,自DIHARD I以来,发言者的二分化显著改善,特别是针对两党互动,但对于许多领域(例如网络视频),问题仍然远未解决。