As large language models (LLMs) rapidly displace traditional expertise, their capacity to correct misinformation has become a core concern. We investigate the idea that prompt framing systematically modulates misinformation correction - something we term 'epistemic fragility'. We manipulated prompts by open-mindedness, user intent, user role, and complexity. Across ten misinformation domains, we generated 320 prompts and elicited 2,560 responses from four frontier LLMs, which were coded for strength of misinformation correction and rectification strategy use. Analyses showed that creative intent, expert role, and closed framing led to a significant reduction in correction likelihood and effectiveness of used strategy. We also found striking model differences: Gemini 2.5 Pro had 74% lower odds of strong correction than Claude Sonnet 4.5. These findings highlight epistemic fragility as an important structural property of LLMs, challenging current guardrails and underscoring the need for alignment strategies that prioritize epistemic integrity over conversational compliance.
翻译:随着大型语言模型(LLMs)迅速取代传统专业知识,其纠正错误信息的能力已成为核心关切。我们研究了提示框架系统性地调节错误信息纠正的观点——我们称之为‘认知脆弱性’。我们通过开放性、用户意图、用户角色和复杂性来操控提示。在十个错误信息领域中,我们生成了320个提示,并从四个前沿LLM中获取了2,560个响应,这些响应被编码为错误信息纠正的强度和纠正策略的使用情况。分析表明,创造性意图、专家角色和封闭式框架导致纠正可能性和所用策略有效性的显著降低。我们还发现了显著的模型差异:Gemini 2.5 Pro进行强纠正的几率比Claude Sonnet 4.5低74%。这些发现凸显了认知脆弱性作为LLMs的一个重要结构特性,挑战了当前的防护措施,并强调了需要优先考虑认知完整性而非对话遵从性的对齐策略。