Automated mental health analysis shows great potential for enhancing the efficiency and accessibility of mental health care, whereas the recent dominant methods utilized pre-trained language models (PLMs) as the backbone and incorporated emotional information. The latest large language models (LLMs), such as ChatGPT, exhibit dramatic capabilities on diverse natural language processing tasks. However, existing studies on ChatGPT's zero-shot performance for mental health analysis have limitations in inadequate evaluation, utilization of emotional information, and explainability of methods. In this work, we comprehensively evaluate the mental health analysis and emotional reasoning ability of ChatGPT on 11 datasets across 5 tasks, including binary and multi-class mental health condition detection, cause/factor detection of mental health conditions, emotion recognition in conversations, and causal emotion entailment. We empirically analyze the impact of different prompting strategies with emotional cues on ChatGPT's mental health analysis ability and explainability. Experimental results show that ChatGPT outperforms traditional neural network methods but still has a significant gap with advanced task-specific methods. The qualitative analysis shows its potential in explainability compared with advanced black-box methods but also limitations on robustness and inaccurate reasoning. Prompt engineering with emotional cues is found to be effective in improving its performance on mental health analysis but requires the proper way of emotion infusion.
翻译:自动化心理健康分析显示出提高心理健康护理的效率和可访问性的巨大潜力,而最近的主导方法利用预训练语言模型(PLMs)作为骨干,并加入情绪信息。最新的大型语言模型(LLMs),如ChatGPT,在各种自然语言处理任务上展现出了惊人的能力。然而,现有研究ChatGPT的零样本性能在不充分的评估、情感信息利用和方法的可解释性方面存在局限性。在这项工作中,我们全面评估了ChatGPT在5个任务的11个数据集上的心理健康分析和情感推理能力,包括二元和多类心理健康状况检测、心理健康状况的原因/因素检测、对话中的情感识别和因果情感蕴含。我们经验性地分析了不同提示策略和具有情感提示的情绪对ChatGPT的心理健康分析能力和可解释性的影响。实验结果表明,ChatGPT在传统神经网络方法上表现出色,但仍然与先进任务特定方法有显著差距。定性分析显示了与高级黑盒方法相比,在可解释性方面的潜力,但在鲁棒性和不准确推理方面也存在限制。发现具有情感提示的提示工程能够有效提高其在心理健康分析方面的性能,但需要正确的情感注入方式。