CARMA：面向阿拉伯语的全面自动标注Reddit心理健康数据集 (CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic)

Mental health disorders affect millions worldwide, yet early detection remains a major challenge, particularly for Arabic-speaking populations where resources are limited and mental health discourse is often discouraged due to cultural stigma. While substantial research has focused on English-language mental health detection, Arabic remains significantly underexplored, partly due to the scarcity of annotated datasets. We present CARMA, the first automatically annotated large-scale dataset of Arabic Reddit posts. The dataset encompasses six mental health conditions, such as Anxiety, Autism, and Depression, and a control group. CARMA surpasses existing resources in both scale and diversity. We conduct qualitative and quantitative analyses of lexical and semantic differences between users, providing insights into the linguistic markers of specific mental health conditions. To demonstrate the dataset's potential for further mental health analysis, we perform classification experiments using a range of models, from shallow classifiers to large language models. Our results highlight the promise of advancing mental health detection in underrepresented languages such as Arabic.

翻译：心理健康障碍影响着全球数百万人，然而早期检测仍是一项重大挑战，尤其对于阿拉伯语人群而言，由于资源有限且文化污名常阻碍心理健康讨论，这一问题更为突出。尽管已有大量研究聚焦于英语心理健康检测，阿拉伯语在此领域仍显著缺乏探索，部分原因在于标注数据集的稀缺。本文提出CARMA，首个自动标注的大规模阿拉伯语Reddit帖子数据集。该数据集涵盖焦虑症、自闭症、抑郁症等六种心理健康状况及一个对照组。CARMA在规模和多样性上均超越现有资源。我们通过定性与定量分析，探究用户间词汇与语义差异，揭示了特定心理健康状况的语言标记特征。为展示该数据集在进一步心理健康分析中的潜力，我们使用从浅层分类器到大型语言模型的一系列模型进行分类实验。研究结果突显了在阿拉伯语等代表性不足语言中推进心理健康检测的前景。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日