This paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with language models for stuttering detection. Compared to the existing work, which depends on the ASR module, our method relies solely on the acoustic signal. We use a time-delay neural network (TDNN) suitable for capturing contextual aspects of the disfluent utterances. We evaluate our system on the UCLASS stuttering dataset consisting of more than 100 speakers. Our method achieves promising results and outperforms the state-of-the-art residual neural network based method. The number of trainable parameters of the proposed method is also substantially less due to the parameter sharing scheme of TDNN.
翻译:本文介绍StutterNet, 这是一种新颖的基于深层次学习的感应探测,能够探测和识别各种类型的不便现象,这一领域的现有工作大多采用自动语音识别(ASR)和语音检测语言模型。与现有工作相比,我们的方法完全依赖音响信号。我们使用一个适合捕捉不便言论背景的长时神经网络(TDNN),我们评估了由100名以上的发言者组成的UCLASS静存数据集系统。我们的方法取得了大有希望的结果,并超越了以最新技术残存神经网络为基础的方法。由于TDN的参数共享计划,拟议方法的可培训参数数量也大大少于此。