This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the systems with different non-speaker-exclusive and speaker-exclusive splits of the Stuttering Events in Podcasts (SEP-28k) dataset to shed some light on the variability of results w.r.t. to the partition method used. Furthermore, we show that the SEP-28k dataset is dominated by only a few speakers, making it difficult to evaluate. To remedy this problem, we created SEP-28k-Extended (SEP-28k-E), containing semi-automatically generated speaker and gender information for the SEP-28k corpus, and suggest different data splits, each useful for evaluating other aspects of methods for dysfluency detection.
翻译:本文从经验上调查了不同数据分割和分裂策略对低能检测系统性能的影响。 为此,我们使用 wav2vec 2. 0 模型进行实验,包括一个分类头和辅助矢量机(SVM)以及从 wav2vec 2. 0 模型中提取的特征,以检测低能。我们用Podcasts(SEP-28k-E)中“静电事件(SEP-28k)”数据集中不同非声音-排他和语音-排他分解来训练和评价系统,以说明结果对所用分区方法的可变性。此外,我们显示SEP-28k数据集仅由少数发言者主导,因此难以评估。为了解决这个问题,我们创建了SEP-28k-Extend(SEPEP-28k-E)系统,其中包含半自动生成的语音器和SEPEP-28k- 系统性别信息,并提出了不同的数据分割,每个数据分割都有助于评估易能检测方法的其他方面。