This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individual phone- and word-level errors, we propose to extract a subject-level representation from a long utterance that is constructed by concatenating multiple test words. The speaker verification approach, and posterior features generated by deep neural network models, are applied to derive various types of holistic representations. A linear classifier is trained to differentiate disordered speech in normal one. On the task of detecting SSD in Cantonese-speaking children, experimental results show that the proposed approach achieves improved detection performance over previous method that requires fusing phone-level detection results. Using articulatory posterior features to derive i-vectors from multiple-word utterances achieves an unweighted average recall of 78.2% and a macro F1 score of 78.0%.
翻译:本文介绍了在儿童语言中自动检测语言声音障碍(SSD)的宏观方法。 通常, SSD表现为语言中特定电话中的持续表达和声调错误。 该障碍可以通过分析儿童主题引发的电话或字词的焦点分析来检测。 在本研究中,我们提议从通过共选多个测试词构建的长期语句中提取一个主题层面的表达方式。 语言校验方法和深神经网络模型生成的后部特征,用于产生各种类型的整体表达方式。 线性分类器经过培训,在正常语言中区分有障碍的言词。 关于在讲广东语的儿童中检测 SSD的任务,实验结果显示,拟议方法在先前的方法下取得了更好的检测性能,需要使用电话级别检测结果。 使用动脉冲后部特征从多词表达中获取i- Victors, 得出了78.2% 和78.0% 的宏观F1分数。