Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-of-the-art performance in depression detection. We also show that combining conventional OpenSMILE and COVAREP features, which carry complementary information, with speaker embeddings further improves the depression detection performance. The significance of temporal context in the training of deep learning models for depression detection is also analyzed in this paper.
翻译:近年来,从演讲中发现抑郁症引起了许多关注,然而,尚未探讨发言者特定信息在发现抑郁症方面的重要性。在这项工作中,我们分析了演讲者嵌入从演讲中发现抑郁症任务的重要性。实验结果表明,演讲者嵌入提供了重要的提示,以实现在检测抑郁症方面最先进的表现。我们还表明,将带有补充信息的常规 OpenSMILE和COVAREP的功能结合起来,让演讲者嵌入进一步提升了抑郁症检测的性能。本文还分析了在培训深度学习模式以发现抑郁症中的时间背景的重要性。