评估人工智能公司前沿安全框架：方法论与结果 (Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results)

Following the Seoul AI Safety Summit in 2024, twelve AI companies published frontier safety frameworks outlining their approaches to managing catastrophic risks from advanced AI systems. These frameworks now serve as a key mechanism for AI risk governance, utilized by regulations and governance instruments such as the EU AI Act's Code of Practice and California's Transparency in Frontier Artificial Intelligence Act. Given their centrality to AI risk management, assessments of such frameworks are warranted. Existing assessments evaluate them at a high level of abstraction and lack granularity on specific practices for companies to adopt. We address this gap by developing a 65-criteria assessment methodology grounded in established risk management principles from safety-critical industries. We evaluate the twelve frameworks across four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance. Companies' current scores are low, ranging from 8% to 35%. By adopting existing best practices already in use across the frameworks, companies could reach 52%. The most critical gaps are nearly universal: companies generally fail to (a) define quantitative risk tolerances, (b) specify capability thresholds for pausing development, and (c) systematically identify unknown risks. To guide improvement, we provide specific recommendations for each company and each criterion.

翻译：继2024年首尔人工智能安全峰会后，十二家人工智能公司发布了前沿安全框架，阐述了其管理先进人工智能系统灾难性风险的方法。这些框架现已成为人工智能风险治理的关键机制，被《欧盟人工智能法案》行为准则和《加利福尼亚州前沿人工智能透明度法案》等法规与治理工具所采用。鉴于其在人工智能风险管理中的核心地位，对此类框架进行评估是必要的。现有评估在高度抽象层面进行，缺乏针对企业应采纳具体实践的细粒度分析。为弥补这一不足，我们基于安全关键行业既有的风险管理原则，开发了一套包含65项标准的评估方法。我们从四个维度评估了这十二个框架：风险识别、风险分析与评估、风险处置以及风险治理。各公司当前得分较低，介于8%至35%之间。通过采纳已在各框架中使用的现有最佳实践，企业得分可提升至52%。最关键的差距几乎普遍存在：企业普遍未能（a）定义定量风险容忍度，（b）明确暂停开发的能力阈值，以及（c）系统性地识别未知风险。为引导改进，我们为每家公司和每项标准提供了具体建议。