Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone, i.e., co-occurs with others. This work explores an approach based on a modified wav2vec 2.0 system for end-to-end stuttering detection and classification as a multi-label problem. The method is evaluated on combinations of three datasets containing English and German stuttered speech, yielding state-of-the-art results for stuttering detection on the SEP-28k-Extended dataset. Experimental results provide evidence for the transferability of features and the generalizability of the method across datasets and languages.
翻译:特殊调整的语音识别模型对于处理排泄式言语是必要的。 要使用这些模型, 就必须有针对性地检测断裂式言语。 最近的工作将滴压作为一个多级分类问题处理, 或者将检测每一种衰弱类型视为一项孤立的任务; 这不能捕捉静式言的性质, 即一个衰弱者很少单独出现, 即与其他人共同发现。 这项工作探索了一种基于修改后的 wav2vec 2. 0 系统的方法, 用于终端到终端的口交检测和分类为多标签问题。 这种方法是根据包含英语和德国口交式言的三个数据集的组合进行评估的, 产生在 SEP- 28k- Expendive式数据集中进行抖动检测的最先进的结果 。 实验结果为特征的可转移性和跨数据集和语言方法的可通用性提供了证据 。