Motivated with recent advances in inferring users' mental state in social media posts, we identify and formulate the problem of finding causal indicators behind mental illness in self-reported text. In the past, we witness the presence of rule-based studies for causal explanation analysis on curated Facebook data. The investigation on transformer-based model for multi-class causal categorization in Reddit posts point to a problem of using long-text which contains as many as 4000 words. Developing end-to-end transformer-based models subject to the limitation of maximum-length in a given instance. To handle this problem, we use Longformer and deploy its encoding on transformer-based classifier. The experimental results show that Longformer achieves new state-of-the-art results on M-CAMS, a publicly available dataset with 62\% F1-score. Cause-specific analysis and ablation study prove the effectiveness of Longformer. We believe our work facilitates causal analysis of depression and suicide risk on social media data, and shows potential for application on other mental health conditions.
翻译:摘要:受社交媒体帖子推断用户心理状态的最新进展启发,我们确定并制定了在自述文本中寻找精神疾病原因的问题。过去,我们看到了基于规则的因果解释分析研究,包括Facebook数据的策划。在Reddit帖子的因果关系多分类问题方面,转换器模型的研究指出了使用包含多达4000个单词的长文本的问题。开发面向给定实例最大长度限制的端到端转换器模型。为了解决这个问题,我们使用Longformer并将其编码部署在基于转换器的分类器上。实验结果表明,Longformer在M-CAMS上实现了新的最新成果,其中62%的F1得分。事实分析和剖析研究证明了Longformer的有效性。我们相信我们的工作有助于对社交媒体数据上的抑郁症和自杀风险进行因果分析,并显示了在其他心理健康状况上应用的潜力。