Depression is a common mental disorder which has been affecting millions of people around the world and becoming more severe with the arrival of COVID-19. Nevertheless proper diagnosis is not accessible in many regions due to a severe shortage of psychiatrists. This scarcity is worsened in low-income countries which have a psychiatrist to population ratio 210 times lower than that of countries with better economies. This study aimed to explore applications of deep learning in diagnosing depression from voice samples. We collected data from the DAIC-WOZ database which contained 189 vocal recordings from 154 individuals. Voice samples from a patient with a PHQ-8 score equal or higher than 10 were deemed as depressed and those with a PHQ-8 score lower than 10 were considered healthy. We applied mel-spectrogram to extract relevant features from the audio. Three types of encoders were tested i.e. 1D CNN, 1D CNN-LSTM, and 1D CNN-GRU. After tuning hyperparameters systematically, we found that 1D CNN-GRU encoder with a kernel size of 5 and 15 seconds of recording data appeared to have the best performance with F1 score of 0.75, precision of 0.64, and recall of 0.92.
翻译:抑郁是一种常见的精神失常,影响到全世界数百万人,随着COVID-19的到来而变得更加严重。然而,由于许多地区严重缺乏精神病医生,无法获得适当的诊断。在低收入国家,这种缺乏的情况更加恶化,这些国家的心理医生对人口比率比经济较好的国家低210倍,比经济较好的国家低210倍。这项研究旨在探索在从声音样本中诊断抑郁症时深思熟虑的应用。我们从DAIC-WoZ数据库收集了数据,该数据库包含154人189个声带录音。一个PHQ-8分等于或高于10分的病人的语音样本被认为是抑郁,而PHQ-8分低于10分的病人的语音样本被认为是健康的。我们用Mel-spectrogram从音响中提取相关特征。我们测试了三种类型的昆虫,即1DCNN、1DCNN-LSTM和1DCNN-GRU。在系统调超分数后,我们发现1DCNN-GRU以5和15秒的内核记录数据显示为0.65和0.65的F1精确度的0.65最佳表现。