Stuttering is a neuro-developmental speech impairment characterized by uncontrolled utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is caused by the failure of speech sensorimotors. Due to its complex nature, stuttering detection (SD) is a difficult task. If detected at an early stage, it could facilitate speech therapists to observe and rectify the speech patterns of persons who stutter (PWS). The stuttered speech of PWS is usually available in limited amounts and is highly imbalanced. To this end, we address the class imbalance problem in the SD domain via a multibranching (MB) scheme and by weighting the contribution of classes in the overall loss function, resulting in a huge improvement in stuttering classes on the SEP-28k dataset over the baseline (StutterNet). To tackle data scarcity, we investigate the effectiveness of data augmentation on top of a multi-branched training scheme. The augmented training outperforms the MB StutterNet (clean) by a relative margin of 4.18% in macro F1-score (F1). In addition, we propose a multi-contextual (MC) StutterNet, which exploits different contexts of the stuttered speech, resulting in an overall improvement of 4.48% in F 1 over the single context based MB StutterNet. Finally, we have shown that applying data augmentation in the cross-corpora scenario can improve the overall SD performance by a relative margin of 13.23% in F1 over the clean training.
翻译:Stusting是神经发育性言语障碍,其特点是无节制的言语(干扰)和核心行为(阻力、重复和延长),其原因是语言感官器的失败。由于其性质复杂,对声震检测(SD)是一项艰巨的任务。如果在早期阶段检测到,它可以便利言语治疗师观察和纠正口吃者(PWS)的言语模式。PWS的言语松散通常数量有限,而且高度不平衡。为此,我们通过多功能(MB)计划,通过在总体损失功能中加权各等级的贡献,解决SEP-28k数据集(StutterNet)的震动类是一个巨大的困难任务。为了解决数据短缺问题,我们调查了数据增强在多功能培训计划之上的效果。强化培训比MB StutterNet(清洁)的言词量要低4.18%,在宏观F1-B(F1)中将各等级的言行各行各行各行各业的言行各业之间相对差距(F1),因此我们提议在总体的Stual-481中进行多功能的改进。