Although explainability and interpretability have received significant attention in artificial intelligence (AI) and natural language processing (NLP) for mental health, reasoning has not been examined in the same depth. Addressing this gap is essential to bridge NLP and mental health through interpretable and reasoning-capable AI systems. To this end, we investigate the pragmatic reasoning capability of large-language models (LLMs) in the mental health domain. We introduce PRiMH dataset, and propose pragmatic reasoning tasks in mental health with pragmatic implicature and presupposition phenomena. In particular, we formulate two tasks in implicature and one task in presupposition. To benchmark the dataset and the tasks presented, we consider four models: Llama3.1, Mistral, MentaLLaMa, and Qwen. The results of the experiments suggest that Mistral and Qwen show substantial reasoning abilities in the domain. Subsequently, we study the behavior of MentaLLaMA on the proposed reasoning tasks with the rollout attention mechanism. In addition, we also propose three StiPRompts to study the stigma around mental health with the state-of-the-art LLMs, GPT4o-mini, Deepseek-chat, and Claude-3.5-haiku. Our evaluated findings show that Claude-3.5-haiku deals with stigma more responsibly compared to the other two LLMs.
翻译:暂无翻译