We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale that measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond. Unlike narrative ladders, our scale is multi-axis and testable. We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index (a weighted geometric mean). We introduce a measurable Self-Improvement Coefficient $κ$ (capability growth per unit of agent-initiated resources) and two closure properties (maintenance and expansion) that convert ``self-improving AI'' into falsifiable criteria. We specify OWA-Bench, an open-world agency benchmark suite that evaluates long-horizon, tool-using, persistent agents. We define level gates for AAI-0\ldots AAI-4 using thresholds on the axes, $κ$, and closure proofs. Synthetic experiments illustrate how present-day systems map onto the scale and how the delegability frontier (quality vs.\ autonomy) advances with self-improvement. We also prove a theorem that AAI-3 agent becomes AAI-5 over time with sufficient conditions, formalizing "baby AGI" becomes Superintelligence intuition.
翻译:我们提出了一种受卡达舍夫启发的、可操作的自主人工智能(AAI)标度,用于衡量从固定机器人流程自动化(AAI-0)到完全通用人工智能(AAI-4)及以上的演进过程。与叙述性阶梯不同,我们的标度是多维且可测试的。我们定义了十个能力维度(自主性、通用性、规划能力、记忆/持久性、工具经济性、自我修正能力、社会性/协调性、具身性、世界模型保真度、经济吞吐量),并通过一个复合AAI指数(加权几何平均)进行聚合。我们引入了一个可度量的自我改进系数$κ$(每单位智能体启动资源所对应的能力增长)以及两个闭合属性(维持性与扩展性),从而将“自我改进的人工智能”转化为可证伪的标准。我们提出了OWA-Bench,一个用于评估长时程、工具使用、持久性智能体的开放世界智能体基准测试套件。我们通过各维度阈值、$κ$值及闭合性证明,定义了AAI-0至AAI-4的等级门槛。合成实验展示了当前系统如何映射到该标度上,以及可委托性前沿(质量与自主性的权衡)如何随自我改进而推进。我们还证明了一个定理:在满足充分条件下,AAI-3智能体随时间推移将演变为AAI-5,从而形式化了“婴儿通用人工智能终成超级智能”的直观理念。