Mental health remains a significant challenge of public health worldwide. With increasing popularity of online platforms, many use the platforms to share their mental health conditions, express their feelings, and seek help from the community and counselors. Some of these platforms, such as Reachout, are dedicated forums where the users register to seek help. Others such as Reddit provide subreddits where the users publicly but anonymously post their mental health distress. Although posts are of varying length, it is beneficial to provide a short, but informative summary for fast processing by the counselors. To facilitate research in summarization of mental health online posts, we introduce Mental Health Summarization dataset, MentSum, containing over 24k carefully selected user posts from Reddit, along with their short user-written summary (called TLDR) in English from 43 mental health subreddits. This domain-specific dataset could be of interest not only for generating short summaries on Reddit, but also for generating summaries of posts on the dedicated mental health forums such as Reachout. We further evaluate both extractive and abstractive state-of-the-art summarization baselines in terms of Rouge scores, and finally conduct an in-depth human evaluation study of both user-written and system-generated summaries, highlighting challenges in this research.
翻译:随着在线平台越来越受欢迎,许多平台利用平台分享其心理健康状况,表达其情感,并寻求社区和顾问的帮助。其中一些平台,如 " 伸展 ",是用户登记寻求帮助的专门论坛。其他平台,如 " Reddit ",提供用户公开、匿名地张贴其心理健康痛苦的子编辑。虽然职位长度不同,但提供简短、但内容丰富的摘要供顾问快速处理是有益的。为了便利心理健康在线站汇总研究,我们引入了 " 心理健康总结数据集 ",包含24公里以上从 " Redddit " 精心挑选的用户的数据集,以及他们从43个 " 心理健康 " 子编辑的英文简短用户编写的摘要(称为 " TLDR " )。这一特定领域数据集不仅有助于生成 " Reddit " 的简短摘要,而且有助于在诸如 " Leachout " 等专门心理健康论坛生成文章摘要。我们进一步评估了 " 心理健康总结和抽象状态 " 总结基线 ",我们从 " Reddit " 中进一步评估了24个仔细选择的用户文章,最后在深入的用户研究报告中介绍这一系统的挑战。