Depression is a large-scale mental health problem and a challenging area for machine learning researchers in detection of depression. Datasets such as Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) have been created to aid research in this area. However, on top of the challenges inherent in accurately detecting depression, biases in datasets may result in skewed classification performance. In this paper we examine gender bias in the DAIC-WOZ dataset. We show that gender biases in DAIC-WOZ can lead to an overreporting of performance. By different concepts from Fair Machine Learning, such as data re-distribution, and using raw audio features, we can mitigate against the harmful effects of bias.
翻译:抑郁症是一个大规模的心理健康问题,也是机器学习研究人员在发现抑郁症方面的一个挑战领域,已经创建了诸如危难分析采访公司-奥兹魔法师(DAIC-WOZ)等数据集,以协助这一领域的研究,然而,除了在准确发现抑郁症方面固有的挑战外,数据集中的偏差可能导致分类表现偏斜。在本文件中,我们研究了DAIC-WOZ数据集中的性别偏见。我们表明,DAIC-WOZ的性别偏见可能导致业绩报告过度。通过公平机器学习的不同概念,例如数据再分配,以及使用原始音频特征,我们可以减轻偏见的有害影响。