Depression is a common and serious mood disorder that negatively affects the patient's capacity of functioning normally in daily tasks. Speech is proven to be a vigorous tool in depression diagnosis. Research in psychiatry concentrated on performing fine-grained analysis on word-level speech components contributing to the manifestation of depression in speech and revealed significant variations at the phoneme-level in depressed speech. On the other hand, research in Machine Learning-based automatic recognition of depression from speech focused on the exploration of various acoustic features for the detection of depression and its severity level. Few have focused on incorporating phoneme-level speech components in automatic assessment systems. In this paper, we propose an Artificial Intelligence (AI) based application for clinical depression recognition and assessment from speech. We investigate the acoustic characteristics of phoneme units, specifically vowels and consonants for depression recognition via Deep Learning. We present and compare three spectrogram-based Deep Neural Network architectures, trained on phoneme consonant and vowel units and their fusion respectively. Our experiments show that the deep learned consonant-based acoustic characteristics lead to better recognition results than vowel-based ones. The fusion of vowel and consonant speech characteristics through a deep network significantly outperforms the single space networks as well as the state-of-art deep learning approaches on the DAIC-WOZ database.
翻译:精神科研究集中于对导致言语中抑郁的言语部分进行细微分析,并揭示了在听音层中抑郁症状的显著差异。另一方面,对基于机器学习的从言语中自动识别抑郁症的研究侧重于探索各种声音特征,以发现抑郁症及其严重程度,很少有人注重在自动评估系统中纳入电话级语音部分。在本文中,我们提议以人工智能为基础应用临床抑郁症识别和语言评估。我们研究了电话单元的声学特征,特别是元音和通过深层学习识别抑郁症的相近性。我们介绍并比较了三种基于光谱的深神经网络结构,分别进行电话配音和配音单元及其融合培训。我们的实验表明,深层学习的调音学特性导致比以词基为基础的系统获得更好的识别结果。我们通过深层学习网络的元音调和校正性语言特征的聚合,作为单一空间网络的深层空间系统,作为单一空间系统数据库的良好学习方式。我们介绍并比较了三个光谱的深层神经网络结构。