Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency (KSoF) dataset by training Support Vector Machine classifiers using features extracted from the fine-tuned models for six different stuttering-related events types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27\% w.r.t. F1-score.
翻译:口交是一种不同的言语障碍,会损害个人的沟通能力。口交者(PWS)经常使用言语疗法来应对其病情。改善对非典型言语患者的言语识别系统或跟踪言语疗法的有效性,需要能够检测出不便的系统,同时能够检测治疗过程中获得的言语技术。本文显示,微调 wav2vec 2.0 用于对含有口交的英语大片片片片段的言语进行分类的微调 wav2vec 2.0 与多任务学习相结合,提高了普通用途 wav2vec 2.0 功能在语音中检测破碎方面的效果;在语言内部和跨语言中,改进语音识别系统。我们评估了我们在流利银行和德国以治疗为核心的卡塞尔特兰特州(KSoF)的方法,通过培训支持病媒机器分析器使用从微调模型中提取的六种与舌流相关事件的特征:路障、延长、声音重复、词重复、插嘴、以及具体用于治疗的言语修改。我们从精调模型中采用27度模型的嵌入到相对分级。