Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework that encodes domain-specific knowledge and disambiguation rules for four entity types. We further introduce two semantically guided few-shot example selection methods to improve in-context performance while reducing labeling effort. Experiments on the RareDis Corpus show that GPT-4o achieves competitive or superior performance compared to BioClinicalBERT, with task-level fine-tuning yielding the strongest performance among the evaluated approaches and improving upon the previously reported BioClinicalBERT baseline. Cost-performance analysis reveals that few-shot prompting delivers high returns at low token budgets. RAG provides limited overall gains but can improve recall for challenging entity types, especially signs and symptoms. An error taxonomy highlights common failure modes such as boundary drift and type confusion, suggesting opportunities for post-processing and hybrid refinement. Our results demonstrate that prompt-optimized LLMs can serve as effective, scalable alternatives to traditional supervised models in biomedical NER, particularly in rare disease applications where annotated data is scarce.
翻译:罕见疾病领域的命名实体识别(NER)面临独特挑战,包括标注数据有限、实体类型间语义模糊以及长尾分布。本研究评估了GPT-4o在低资源环境下进行罕见疾病NER的能力,采用了一系列基于提示的策略,包括零样本提示、少样本上下文学习、检索增强生成(RAG)以及任务级微调。我们设计了一个结构化提示框架,该框架为四种实体类型编码了领域特定知识和消歧规则。进一步提出了两种语义引导的少样本示例选择方法,以在降低标注成本的同时提升上下文学习性能。在RareDis语料库上的实验表明,GPT-4o相较于BioClinicalBERT取得了具有竞争力或更优的性能,其中任务级微调在所有评估方法中表现最强,并超越了先前报道的BioClinicalBERT基线。成本-性能分析显示,少样本提示在较低标记预算下即可获得高回报。RAG带来的整体增益有限,但能提升困难实体类型(特别是体征与症状)的召回率。错误分类学分析揭示了边界漂移和类型混淆等常见失效模式,为后处理和混合优化提供了改进方向。我们的研究结果表明,经过提示优化的LLM可作为生物医学NER中传统监督模型的有效、可扩展替代方案,尤其在标注数据稀缺的罕见疾病应用场景中。