DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., `yes' or `no,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.
翻译:DER是评估二分化在面临进退两难时的表现的主要衡量标准:短话或片段的错误往往被长话或片段的错误压得过,短段,例如“是”或“否”,仍然有语义信息;此外,DER忽略了发言较少者中的错误。虽然JER平衡了发言者的错误,但它仍然有同样的难题。考虑到所有这些方面、持续时间错误、部分错误以及构成完全分化评价的发言者体重错误,我们建议平衡错误率(BER)来评价发言者的diariz化。首先,我们建议通过连接子图和适应性IoU阈值的分层错误率(SER)来获得准确的分段匹配。第二,为了以统一的方式评价分段之间的分级化,我们采用了一个针对特定发言者的分段和分段之间的调,然后是按发言者体重平均。第三,我们通过模块化系统(END)和真实数据集的多调制方法分析我们的衡量尺度。SER和BER可以公开查阅https://githhubub.com/X-Lans/BER/BER。