Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models (LLMs) offer a potential aid by automating parts of the interpretation process. We evaluate four models (ChatGPT, Claude, Gemini, and DeepSeek) across twelve prompting techniques to interpret semi-structured and unstructured vulnerability information. As a concrete use case, we test each model's ability to predict decision points in the Stakeholder-Specific Vulnerability Categorization (SSVC) framework: Exploitation, Automatable, Technical Impact, and Mission and Wellbeing. Using 384 real-world vulnerabilities from the VulZoo dataset, we issued more than 165,000 queries to assess performance under prompting styles including one-shot, few-shot, and chain-of-thought. We report F1 scores for each SSVC decision point and Cohen's kappa (weighted and unweighted) for the final SSVC decision outcomes. Gemini consistently ranked highest, leading on three of four decision points and yielding the most correct recommendations. Prompting with exemplars generally improved accuracy, although all models struggled on some decision points. Only DeepSeek achieved fair agreement under weighted metrics, and all models tended to over-predict risk. Overall, current LLMs do not replace expert judgment. However, specific LLM and prompt combinations show moderate effectiveness for targeted SSVC decisions. When applied with care, LLMs can support vulnerability prioritization workflows and help security teams respond more efficiently to emerging threats.
翻译:安全分析师面临日益增长的压力,需要处理庞大且复杂的漏洞积压。大型语言模型(LLMs)通过自动化部分解释流程,提供了潜在的辅助手段。我们评估了四种模型(ChatGPT、Claude、Gemini和DeepSeek)在十二种提示技术下解释半结构化和非结构化漏洞信息的能力。作为一个具体用例,我们测试了每个模型在利益相关者特定漏洞分类(SSVC)框架中预测决策点的能力:可利用性、可自动化程度、技术影响以及使命与福祉。利用来自VulZoo数据集的384个真实漏洞,我们发出了超过165,000次查询,以评估包括单样本、少样本和思维链在内的多种提示风格下的性能。我们报告了每个SSVC决策点的F1分数以及最终SSVC决策结果的科恩卡帕系数(加权和非加权)。Gemini在四个决策点中的三个上表现最佳,持续排名最高,并产生了最多的正确建议。使用示例进行提示通常能提高准确性,尽管所有模型在某些决策点上仍存在困难。仅DeepSeek在加权指标下达到了中等一致性,且所有模型均倾向于过度预测风险。总体而言,当前的大型语言模型尚无法替代专家判断。然而,特定的LLM与提示组合在针对性的SSVC决策中显示出中等有效性。在谨慎应用的情况下,LLMs能够支持漏洞优先级排序工作流程,并帮助安全团队更高效地应对新兴威胁。