Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings. This paper shows how existing NLP methods can yield information on clinical, demographic, and identity characteristics of almost 20K Reddit users who self-report a bipolar disorder diagnosis. This population consists of slightly more feminine- than masculine-gendered mainly young or middle-aged US-based adults who often report additional mental health diagnoses, which is compared with general Reddit statistics and epidemiological studies. Additionally, this paper carefully evaluates all methods and discusses ethical issues.
翻译:最近,利用公共在线数据(包括Reddit)进行的心理健康状况研究在NLP和健康研究中激增,但没有报告用户特征,这些特征对于判断调查结果的可概括性十分重要。本文展示了现有NLP方法如何能提供近20K Reddd用户的临床、人口和身份特征特征信息,这些用户自我报告两极病诊断结果。 这个人群由女性比男性略多一点,主要是年轻或中年美国成年人,他们常常报告更多的心理健康诊断,这与Redddit的一般统计数据和流行病学研究相比。 此外,本文仔细评估了所有方法并讨论了伦理问题。