Self-disclosed mental health diagnoses, which serve as ground truth annotations of mental health status in the absence of clinical measures, underpin the conclusions behind most computational studies of mental health language from the last decade. However, psychiatric conditions are dynamic; a prior depression diagnosis may no longer be indicative of an individual's mental health, either due to treatment or other mitigating factors. We ask: to what extent are self-disclosures of mental health diagnoses actually relevant over time? We analyze recent activity from individuals who disclosed a depression diagnosis on social media over five years ago and, in turn, acquire a new understanding of how presentations of mental health status on social media manifest longitudinally. We also provide expanded evidence for the presence of personality-related biases in datasets curated using self-disclosed diagnoses. Our findings motivate three practical recommendations for improving mental health datasets curated using self-disclosed diagnoses: 1) Annotate diagnosis dates and psychiatric comorbidities; 2) Sample control groups using propensity score matching; 3) Identify and remove spurious correlations introduced by selection bias.
翻译:在缺乏临床措施的情况下,自我披露的心理健康诊断作为心理健康状况的基本真相说明,是过去10年来大多数心理健康语言计算研究结论的基础;然而,精神状况是动态的;由于治疗或其他减轻因素,先前的抑郁症诊断可能不再能表明个人的心理健康;我们问:自我披露心理健康诊断结果在多大程度上与时俱进?我们分析5年前在社交媒体上披露抑郁症诊断结果的个人最近的活动,反过来,我们又重新了解社会媒体对心理健康状况的表述方式如何表现为纵向的。我们还提供了更多证据,说明在使用自我披露的诊断整理的数据中存在与个性有关的偏见。我们的调查结果鼓励提出三项切实可行的建议,用自我披露的诊断方法改进心理健康数据集:(1) 匿名诊断日期和精神病并发症;(2) 抽样控制小组使用偏好分数匹配;(3) 查明和消除选择偏差带来的虚假关联。