Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research. The recently proposed polyphonic sound detection (PSD)-receiver operating characteristic (ROC) and PSD score (PSDS) make an important step into the direction of an evaluation of SED systems which is independent from a certain decision threshold. This allows to obtain a more complete picture of the overall system behavior which is less biased by threshold tuning. Yet, the PSD-ROC is currently only approximated using a finite set of thresholds. The choice of the thresholds used in approximation, however, can have a severe impact on the resulting PSDS. In this paper we propose a method which allows for computing system performance on an evaluation set for all possible thresholds jointly, enabling accurate computation not only of the PSD-ROC and PSDS but also of other collar-based and intersection-based performance curves. It further allows to select the threshold which best fulfills the requirements of a given application. Source code is publicly available in our SED evaluation package sed_scores_eval.
翻译:对健全事件探测系统进行适当评价远非微不足道,目前仍在研究中。最近提议的多声探测(PSD)接收器操作特性(ROC)和私营部门分(PSDS)对评价系统的方向迈出了重要一步,这种评价独立于某一决定阈值。这样可以更完整地了解整个系统行为,这种系统行为因阈值调调差而不太偏差。然而,目前私营部门司-区域办事处只是使用一套有限的阈值来比较。不过,近似中使用的阈值的选择可能会对由此产生的私营部门司产生严重影响。在本文件中,我们提出了一个方法,允许在所有可能的阈值评价数据集上计算系统性能,不仅能够准确计算私营部门司-区域合作司和私营司,而且能够准确计算其他基于圈子和基于交叉的性能曲线。它还允许选择最符合特定应用要求的阈值。我们的SED评价包 sed_ scorps_ eval可公开提供源代码。