This first-of-its-kind paper presents a novel approach named PASAD that detects changes in perceptually fluent speech acoustics of young children. Particularly, analysis of perceptually fluent speech enables identifying the speech-motor-control factors that are considered as the underlying cause of stuttering disfluencies. Recent studies indicate that the speech production of young children, especially those who stutter, may get adversely affected by situational physiological arousal. A major contribution of this paper is leveraging the speaker's situational physiological responses in real-time to analyze the speech signal effectively. The presented PASAD approach adapts a Hyper-Network structure to extract temporal speech importance information leveraging physiological parameters. In addition, a novel non-local acoustic spectrogram feature extraction network identifies meaningful acoustic attributes. Finally, a sequential network utilizes the acoustic attributes and the extracted temporal speech importance for effective classification. We collected speech and physiological sensing data from 73 preschool-age children who stutter (CWS) and who don't stutter (CWNS) in different conditions. PASAD's unique architecture enables visualizing speech attributes distinct to a CWS's fluent speech and mapping them to the speaker's respective speech-motor-control factors (i.e., speech articulators). Extracted knowledge can enhance understanding of children's fluent speech, speech-motor-control (SMC), and stuttering development. Our comprehensive evaluation shows that PASAD outperforms state-of-the-art multi-modal baseline approaches in different conditions, is expressive and adaptive to the speaker's speech and physiology, generalizable, robust, and is real-time executable on mobile and scalable devices.
翻译:这份首页文件展示了一种名为“PASAD”的新颖方法,它揭示了幼儿感知流流听的语音声学的变化。特别是,对感知流流听的语音分析能够确定被视为引起混乱的根本原因的语音运动控制因素。最近的研究表明,幼儿,尤其是那些口吃的儿童的语音制作可能会受到情境生理刺激的不利影响。本文的一项主要贡献是利用演讲者实时的情境生理反应来有效分析语音信号。介绍了“SASAD”方法调整了超网络结构,以提取具有时间意义的语音重要性的信息,利用生理参数。此外,一个新的非本地声波控制因素提取网络确定了有意义的声学属性。最后,一个连续网络利用声学属性和抽取的语音重要性来进行有效分类。我们收集了73个学龄前儿童(CWS)的言调和感知觉反应(CESPADAD)在不同的语言调、语言分析、语言分析、语言分析、语言分析、语言分析、语言分析等方面的独特结构,提高了CWSA的语音分析能力。