As progress in AI continues to advance, it is crucial to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) systems should be modeled as as something which humans, by definition, can't reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make obviously irrational decisions in adversarial settings. In a survey of relevant dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made toward understanding the ways in which these weaknesses might be implanted into a system.
翻译:随着AI的不断进步,关键是要知道先进系统将如何做出选择和以何种方式失败。 机器在某些领域已经超越了人类,了解如何安全地建设在人一级或以上可能具有能力的人,这尤其令人关切。 人们可能会怀疑,人工智能(AGI)和人工超智能(ASI)系统应该被仿制为人类根据定义无法可靠地发挥超智能作用的东西。 作为对这一假设的挑战,本文提出了Achilles Heel假设,其中指出即使是潜在的超智能系统也可能具有稳定的决定理论错觉,导致他们在对抗环境中作出明显不合理的决定。在对决策理论文献中相关的两难境地和矛盾之处的调查中,在这种假设的背景下讨论了这些潜在的Achilles Heels系统。一些新的贡献有助于理解如何将这些弱点植入一个系统。