We propose a dataset, AVASpeech-SMAD, to assist speech and music activity detection research. With frame-level music labels, the proposed dataset extends the existing AVASpeech dataset, which originally consists of 45 hours of audio and speech activity labels. To the best of our knowledge, the proposed AVASpeech-SMAD is the first open-source dataset that features strong polyphonic labels for both music and speech. The dataset was manually annotated and verified via an iterative cross-checking process. A simple automatic examination was also implemented to further improve the quality of the labels. Evaluation results from two state-of-the-art SMAD systems are also provided as a benchmark for future reference.
翻译:我们提议建立一个数据集,AVASpeech-SMAD,以协助语音和音乐活动检测研究;在框架级音乐标签方面,拟议的数据集扩展了现有的AVASpeech数据集,该数据集最初由45小时的音频和语音活动标签组成;据我们所知,拟议的AVASpeech-SMAD是首个开放源数据集,其中含有音乐和语音的强烈多功能标签;数据集是人工加注的,并通过迭接交叉核对程序核查;还进行了简单的自动检查,以进一步提高标签质量;两个最先进的SMAD系统的评价结果也作为今后参考的基准。