Depression detection research has increased over the last few decades, one major bottleneck of which is the limited data availability and representation learning. Recently, self-supervised learning has seen success in pretraining text embeddings and has been applied broadly on related tasks with sparse data, while pretrained audio embeddings based on self-supervised learning are rarely investigated. This paper proposes DEPA, a self-supervised, pretrained depression audio embedding method for depression detection. An encoder-decoder network is used to extract DEPA on in-domain depressed datasets (DAIC and MDD) and out-domain (Switchboard, Alzheimer's) datasets. With DEPA as the audio embedding extracted at response-level, a significant performance gain is achieved on downstream tasks, evaluated on both sparse datasets like DAIC and large major depression disorder dataset (MDD). This paper not only exhibits itself as a novel embedding extracting method capturing response-level representation for depression detection but more significantly, is an exploration of self-supervised learning in a specific task within audio processing.
翻译:在过去几十年中,抑郁症检测研究有所增加,其中一个主要瓶颈是数据可用性和代表性学习有限。最近,自我监督的学习在培训前文本嵌入方面取得了成功,并广泛应用于相关任务,数据稀少,而基于自我监督学习的预先培训的音频嵌入则很少被调查。本文建议DEPA是一种自我监督的、预先培训的抑郁症听力嵌入抑郁症检测方法。一个编码器解密器网络被用来提取DEPA关于内部抑郁症数据集(DAIC和MDD)和外部数据集(Switchboard,阿尔茨海默氏)的数据。以DEPA作为在应对层面的音频嵌入,在下游任务上取得了显著的绩效收益,在DACC和大型抑郁症大型数据集(MDD)等稀释数据集上都进行了评估。这份文件不仅展示了一种新颖的嵌入式提取方法,用以获取抑郁症检测反应级别代表,而且更为显著的是,它探索了在音频处理的具体任务中自我监督的学习。