We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human validation study with 108 senior volunteers, AI-generated phishing emails successfully compromised 11\% of participants. Our work uniquely demonstrates the complete attack pipeline targeting elderly populations, highlighting that current AI safety measures fail to protect those most vulnerable to fraud. Beyond generating phishing content, LLMs enable attackers to overcome language barriers and conduct multi-turn trust-building conversations at scale, fundamentally transforming fraud economics. While some providers report voluntary counter-abuse efforts, we argue these remain insufficient.
翻译:我们展示了攻击者如何利用AI安全漏洞危害弱势群体的端到端实证:从越狱大型语言模型生成钓鱼内容,到将这些信息部署于真实目标,直至成功侵害老年受害者。我们系统评估了涵盖四类攻击场景的六种前沿大型语言模型的安全防护机制,揭示了关键缺陷——多个模型对特定攻击向量表现出近乎完全的易感性。在一项涉及108名老年志愿者的人类验证研究中,AI生成的钓鱼邮件成功侵害了11%的参与者。本研究独特地展示了针对老年群体的完整攻击链路,凸显出现有AI安全措施未能有效保护最易受欺诈侵害的群体。除生成钓鱼内容外,大型语言模型使攻击者能够突破语言障碍,规模化开展多轮信任建立对话,从根本上改变了欺诈的经济模式。尽管部分服务商报告了自愿的反滥用措施,我们认为这些仍显不足。