2025年国际人工智能安全报告：第二次关键更新：技术保障措施与风险管理 (International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management)

Yoshua Bengio,Stephen Clare,Carina Prunkl,Maksym Andriushchenko,Ben Bucknall,Philip Fox,Nestor Maslej,Conor McGlynn,Malcolm Murray,Shalaleh Rismani,Stephen Casper,Jessica Newman,Daniel Privitera,Sören Mindermann,Daron Acemoglu,Thomas G. Dietterich,Fredrik Heintz,Geoffrey Hinton,Nick Jennings,Susan Leavy,Teresa Ludermir,Vidushi Marda,Helen Margetts,John McDermid,Jane Munga,Arvind Narayanan,Alondra Nelson,Clara Neppel,Gopal Ramchurn,Stuart Russell,Marietje Schaake,Bernhard Schölkopf,Alavaro Soto,Lee Tiedrich,Gaël Varoquaux,Andrew Yao,Ya-Qin Zhang,Leandro Aguirre,Olubunmi Ajala,Fahad Albalawi,Noora AlMalek,Christian Busch,André Carvalho,Jonathan Collas,Amandeep Gill,Ahmet Hatip,Juha Heikkilä,Chris Johnson,Gill Jolly,Ziv Katzir,Mary Kerema,Hiroaki Kitano,Antonio Krüger,Aoife McLysaght,Oleksii Molchanovskyi,Andrea Monti,Kyoung Mu Lee,Mona Nemer,Nuria Oliver,Raquel Pezoa,Audrey Plonk,José Portillo,Balaraman Ravindran,Hammam Riza,Crystal Rugege,Haroon Sheikh,Denise Wong,Yi Zeng,Liming Zhu

This second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year. It examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI. In recent months, for example, three leading AI developers applied enhanced safeguards to their new models, as their internal pre-deployment testing could not rule out the possibility that these models could be misused to help create biological weapons. Beyond specific precautionary measures, there have been a range of other advances in techniques for making AI models and systems more reliable and resistant to misuse. These include new approaches in adversarial training, data curation, and monitoring systems. In parallel, institutional frameworks that operationalise and formalise these technical capabilities are starting to emerge: the number of companies publishing Frontier AI Safety Frameworks more than doubled in 2025, and governments and international organisations have established a small number of governance frameworks for general-purpose AI, focusing largely on transparency and risk assessment.

翻译：本次对2025年国际人工智能安全报告的第二次更新，评估了过去一年中通用人工智能风险管理领域的新进展。报告审视了研究人员、公共机构及人工智能开发者如何应对通用人工智能的风险管理。例如，近几个月来，三家领先的人工智能开发者对其新模型实施了强化保障措施，因其内部部署前测试无法排除这些模型可能被滥用于协助制造生物武器的可能性。除具体的预防措施外，人工智能模型与系统在提升可靠性和抗滥用能力的技术层面也取得了一系列进展，包括对抗训练、数据策展及监控系统的新方法。与此同时，实现并规范化这些技术能力的制度框架开始显现：2025年发布前沿人工智能安全框架的公司数量增加了一倍以上，各国政府及国际组织已建立了少量针对通用人工智能的治理框架，主要聚焦于透明度与风险评估。