Performance of speaker recognition systems is evaluated on test trials. Although as crucial as rulers for tailors, trials have not been carefully treated so far, and most existing benchmarks compose trials by naive cross-pairing. In this paper, we argue that the cross-pairing approach produces overwhelming easy trials, which in turn leads to potential bias in system and technique comparison. To solve the problem, we advocate more attention to hard trials. We present an SVM-based approach to identifying hard trials and use it to construct new evaluation sets for VoxCeleb1 and SITW. With the new sets, we can re-evaluate the contribution of some recent technologies. The code and the identified hard trials will be published online at http://project.cslt.org.
翻译:虽然作为裁缝的统治者,迄今尚未认真对待审判,而且大多数现有基准都是通过天真交叉方式进行审判的。在本文件中,我们争辩说,交叉挑选方法会产生极其容易的审判,这反过来又会导致系统和技术比较方面的潜在偏差。为了解决问题,我们主张更多地注意困难审判。我们提出了一个基于SVM的办法来查明困难审判,并用它来为VoxCeleb1和SITW设计新的评价套件。有了新的套件,我们可以重新评估一些最新技术的贡献。该代码和已查明的艰难审判将在http://project.cslt.org上公布。