Speech based depression classification has gained immense popularity over the recent years. However, most of the classification studies have focused on binary classification to distinguish depressed subjects from non-depressed subjects. In this paper, we formulate the depression classification task as a severity level classification problem to provide more granularity to the classification outcomes. We use articulatory coordination features (ACFs) developed to capture the changes of neuromotor coordination that happens as a result of psychomotor slowing, a necessary feature of Major Depressive Disorder. The ACFs derived from the vocal tract variables (TVs) are used to train a dilated Convolutional Neural Network based depression classification model to obtain segment-level predictions. Then, we propose a Recurrent Neural Network based approach to obtain session-level predictions from segment-level predictions. We show that strengths of the segment-wise classifier are amplified when a session-wise classifier is trained on embeddings obtained from it. The model trained on ACFs derived from TVs show relative improvement of 27.47% in Unweighted Average Recall (UAR) at the session-level classification task, compared to the ACFs derived from Mel Frequency Cepstral Coefficients (MFCCs).
翻译:近些年来,基于抑郁症的言语分类已广受欢迎,然而,大多数分类研究侧重于二进制分类,以区分抑郁症与非抑郁症。在本文中,我们将抑郁症分类任务作为一个严重程度分类问题,以便为分类结果提供更多的颗粒性。我们使用为记录神经运动协调变化而开发的动脉协调功能,这种变化是由于精神运动减速(精神运动减速是抑郁症的一个必要特征)所致。来自声道变异(TVs)的ACF用于培训基于超演动神经网络的抑郁症分类模型,以获得分层预测。然后,我们提出基于常态神经网络的方法,以便从分层预测中获取会话一级的预测。我们表明,如果对会议明智的变压器进行嵌入培训,则会话式变压器的强力会得到增强。从电视台获得的ACF变量培训的模型显示,在会场一级平均回调率(UAR)中,在不加权平均回调(Mel-C-C-C-C-C-C-C-C-SVl-Sl-C-Sl-C-Slassal-Slasslassl)中,从C-C-C-C-Slal-C-Slal-Slal-Sl-Slg-C-S-Sl)中,比值调低调调调调时,则显示。