As progress in AI continues to advance, it is important to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make irrational decisions in adversarial settings. In a survey of key dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made toward understanding the ways in which these weaknesses might be implanted into a system.
翻译:随着人工智能的不断发展,了解先进系统将如何作出选择以及它们可能出现哪些故障非常重要。在某些领域,机器已经可以比人类更加精明,因此了解如何安全地构建可能具有与人类水平相同或以上能力的人工智能和超级人工智能至关重要。虽然人们可能认为具有人工智能和超级人工智能的系统是无法可靠地自欺欺人的,但本文提出了阿喀琉斯之踵假设,即即使是潜在的超级智能系统,也可能出现稳定的决策论幻觉,导致它们在对抗环境下做出非理性的决策。通过对决策理论文献中的关键困境和悖论的调查,本文讨论了这些潜在阿喀琉斯之踵在该假设下的背景。本文还对了解这些弱点可能被植入系统的方式做出了几点新的贡献。