迈向揭示医疗大语言模型中的细微偏见 (Toward Revealing Nuanced Biases in Medical LLMs)

Large language models (LLMs) used in medical applications are known to be prone to exhibiting biased and unfair patterns. Prior to deploying these in clinical decision-making, it is crucial to identify such bias patterns to enable effective mitigation and minimize negative impacts. In this study, we present a novel framework combining knowledge graphs (KGs) with auxiliary (agentic) LLMs to systematically reveal complex bias patterns in medical LLMs. The proposed approach integrates adversarial perturbation (red teaming) techniques to identify subtle bias patterns and adopts a customized multi-hop characterization of KGs to enhance the systematic evaluation of target LLMs. It aims not only to generate more effective red-teaming questions for bias evaluation but also to utilize those questions more effectively in revealing complex biases. Through a series of comprehensive experiments on three datasets, six LLMs, and five bias types, we demonstrate that our proposed framework exhibits a noticeably greater ability and scalability in revealing complex biased patterns of medical LLMs compared to other common approaches.

翻译：在医疗应用中使用的大语言模型 (LLMs) 已知容易表现出有偏见和不公平的模式。在将这些模型部署于临床决策之前，识别此类偏见模式对于实现有效缓解并最小化负面影响至关重要。在本研究中，我们提出了一种新颖的框架，将知识图谱 (KGs) 与辅助性 (代理) LLMs 相结合，以系统地揭示医疗 LLMs 中复杂的偏见模式。所提出的方法集成了对抗性扰动 (红队测试) 技术来识别细微的偏见模式，并采用定制的 KGs 多跳表征来增强对目标 LLMs 的系统性评估。其目标不仅在于为偏见评估生成更有效的红队测试问题，还在于更有效地利用这些问题来揭示复杂的偏见。通过对三个数据集、六个 LLMs 和五种偏见类型进行的一系列综合实验，我们证明，与其他常见方法相比，我们提出的框架在揭示医疗 LLMs 的复杂偏见模式方面表现出显著更强的能力和可扩展性。