This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe. I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of ~5% that an existential catastrophe of this kind will occur by 2070. (May 2022 update: since making this report public in April 2021, my estimate here has gone up, and is now at >10%.)
翻译:本报告审视了我认为对错误人工智能的存在风险的关切的核心论点。我分两个阶段进行。首先,我提出一个背景情况,说明这种关切。从这一角度讲,智能机构是一个极其强大的力量,其创建代理人比我们玩火时要聪明得多 -- -- 特别是,如果它们的目标存在问题,这些代理人显然具有寻求人类权力的有利动力。第二,我拟订和评价了一个更具体的六种假设,即建立这种代理人将在2070年前导致生存灾难。关于这个论点,到2070年:(1)建立相关强力和有力的AI系统将变得可能,财政上也是可行的;(2) 这样做的激励力将十分强大;(3) 建立与我们玩火的人相配合(和相关强力/代理人)的AI系统将比建立错误(和相关强力/代理人)的AI系统更为困难。第二,我制定并评价一个更具体的六种理由,即建立这种类型的代理人将在2070年前导致人类生存灾难;(5) 这个问题将升级至人类完全丧失能力;(6) 在现在,这种僵化在20年前,我将提出这种僵化的理论将构成一个总体的灾难。