Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score.
翻译:口交是一种不同的语言障碍,会损害个人的沟通能力。口交者(PWS)经常使用言语疗法应对其病情。改善对非典型言语患者的言语识别系统或跟踪言语疗法的功效,需要能够检测功能不全的系统,同时能够检测治疗过程中获得的言语技术。本文显示,微调 wav2vec 2.0 [1] 用于对含有口吐口语的大量英国文体上的言语进行分类的微调,结合多任务学习,提高普通用途 wav2vec 2.0 功能的效力,以发现言语中的窃听;在语言内部和跨语言中,改进语音识别系统。我们评估了我们的Flugency Bank 方法,[2] 和德国以治疗为核心的卡塞尔液状态[3] 数据,通过培训支持病媒机器分类器使用从微调模型中提取的六种与口吐口交相关事件类型的特征:块、延长、声音重复、词重复、跨点和具体用于治疗的言语修正的功能,从27 模型到相对变化。