网络安全人工智能：全球顶尖安全夺旗赛AI智能体 (Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF))

Víctor Mayoral-Vilches,Luis Javier Navarrete-Lozano,Francesco Balassone,María Sanz-Gómez,Cristóbal R. J. Veas Chavez,Maite del Mundo de Torres,Vanesa Turiel

Are Capture-the-Flag competitions obsolete? In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions, achieving Rank #1 at multiple events and consistently outperforming thousands of human teams. Across five major circuits-HTB's AI vs Humans, Cyber Apocalypse (8,129 teams), Dragos OT CTF, UWSP Pointer Overflow, and the Neurogrid CTF showdown-CAI demonstrated that Jeopardy-style CTFs have become a solved game for well-engineered AI agents. At Neurogrid, CAI captured 41/45 flags to claim the $50,000 top prize; at Dragos OT, it sprinted 37% faster to 10K points than elite human teams; even when deliberately paused mid-competition, it maintained top-tier rankings. Critically, CAI achieved this dominance through our specialized alias1 model architecture, which delivers enterprise-scale AI security operations at unprecedented cost efficiency and with augmented autonomy-reducing 1B token inference costs from $5,940 to just $119, making continuous security agent operation financially viable for the first time. These results force an uncomfortable reckoning: if autonomous agents now dominate competitions designed to identify top security talent at negligible cost, what are CTFs actually measuring? This paper presents comprehensive evidence of AI capability across the 2025 CTF circuit and argues that the security community must urgently transition from Jeopardy-style contests to Attack & Defense formats that genuinely test adaptive reasoning and resilience-capabilities that remain uniquely human, for now.

翻译：夺旗竞赛是否已过时？2025年，网络安全人工智能系统在多项全球顶级黑客竞赛中实现系统性突破，于多场赛事中位列榜首，持续超越数千支人类团队。在五大主要赛事——HTB人机对抗赛、Cyber Apocalypse（8,129支队伍）、Dragos工控系统夺旗赛、UWSP指针溢出赛及Neurogrid夺旗巅峰对决中，CAI证明经典 jeopardy 式夺旗赛对于经过精心设计的AI智能体已成为可解问题。在Neurogrid赛事中，CAI成功夺取45面旗帜中的41面，赢得5万美元最高奖金；在Dragos工控系统赛中，其达到1万积分点的速度比顶尖人类团队快37%；即使在比赛中途被故意暂停，仍能保持顶级排名。关键在于，CAI通过我们专有的alias1模型架构实现这一统治性表现，该架构以前所未有的成本效益和增强的自主性提供企业级AI安全运维——将10亿token的推理成本从5,940美元降至仅119美元，首次使持续运行的安全智能体在经济上具备可行性。这些结果引发了一个令人不安的反思：如果自主智能体能以可忽略的成本主导旨在选拔顶尖安全人才的竞赛，那么夺旗赛实际测量的究竟是什么？本文通过2025年夺旗赛系列中的全面证据展示AI能力，并主张安全社区必须紧急从 jeopardy 式竞赛转向真正考验适应性推理与抗逆能力的攻防对抗模式——这些能力目前仍为人类所独有。

相关内容

关注 0

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

美国防部机构推动人工智能红队测试演进：DARPA“战场效能鲁棒性人工智能安全”（SABER）项目

专知会员服务

17+阅读 · 9月15日

人工智能专题研究：光芯片——AI时代“芯”核心，57页ppt

专知会员服务

48+阅读 · 2023年7月28日

UC San Diego清华大学CVPR2022《具身人工智能》教程，260+页ppt

专知会员服务

76+阅读 · 2022年6月24日