This document provides a brief description of the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) conversational telephone speech (CTS) Superset. The CTS Superset has been created in an attempt to provide the research community with a large-scale dataset along with uniform metadata that can be used to effectively train and develop telephony (narrowband) speaker recognition systems. It contains a large number of telephony speech segments from more than 6800 speakers with speech durations distributed uniformly in the [10s, 60s] range. The segments have been extracted from the source corpora used to compile prior SRE datasets (SRE1996-2012), including the Greybeard corpus as well as the Switchboard and Mixer series collected by the Linguistic Data Consortium (LDC). In addition to the brief description, we also report speaker recognition results on the NIST 2020 CTS Speaker Recognition Challenge, obtained using a system trained with the CTS Superset. The results will serve as a reference baseline for the challenge.
翻译:本文件简要介绍了国家标准和技术研究所(NIST)语音语音识别评价(SRE)语音语音超集,创建CTS超集是为了向研究界提供大规模数据集以及可用于有效培训和开发电话(窄带)语音识别系统的统一元数据,其中载有来自6800多个发言者的大量电话语音部分,其语音持续时间在[10、60s]范围内统一分布。这些部分是从用于汇编先前的SRE数据集(SRE1996-2012年)的来源公司中提取的,包括灰熊体以及由语言数据联合会收集的切换板和混合系列。除了简要说明外,我们还报告了通过CTS Superse集培训的系统获得的NIST 2020 CTS语音识别挑战的语音识别结果。这些结果将作为挑战的参考基准。