Large language models (LLM) have been successful in several natural language understanding tasks and could be relevant for natural language processing (NLP)-based mental health application research. In this work, we report the performance of LLM-based ChatGPT (with gpt-3.5-turbo backend) in three text-based mental health classification tasks: stress detection (2-class classification), depression detection (2-class classification), and suicidality detection (5-class classification). We obtained annotated social media posts for the three classification tasks from public datasets. Then ChatGPT API classified the social media posts with an input prompt for classification. We obtained F1 scores of 0.73, 0.86, and 0.37 for stress detection, depression detection, and suicidality detection, respectively. A baseline model that always predicted the dominant class resulted in F1 scores of 0.35, 0.60, and 0.19. The zero-shot classification accuracy obtained with ChatGPT indicates a potential use of language models for mental health classification tasks.
翻译:大型语言模型(LLM)在多种自然语言理解任务方面取得了成功,并且可能与基于NLP的精神健康应用研究相关。本研究报告了基于LLM的ChatGPT(使用gpt-3.5-turbo后端)在三个基于文本的精神健康分类任务中的表现:压力检测(2类分类)、抑郁症检测(2类分类)和自杀倾向检测(5类分类)。我们从公共数据集中获取了三个分类任务的注释社交媒体帖子。然后,使用ChatGPT API并添加分类的输入提示对社交媒体帖子进行分类。我们在压力检测、抑郁症检测和自杀倾向检测方面分别获得了0.73、0.86和0.37的F1分数。总是预测主导类别的基准模型的F1分数分别为0.35、0.60和0.19。ChatGPT获得的零样本分类准确性表明了语言模型在精神健康分类任务中的潜在用途。