An impossibility theorem demonstrates that a particular problem or set of problems cannot be solved as described in the claim. Such theorems put limits on what is possible to do concerning artificial intelligence, especially the super-intelligent one. As such, these results serve as guidelines, reminders, and warnings to AI safety, AI policy, and governance researchers. These might enable solutions to some long-standing questions in the form of formalizing theories in the framework of constraint satisfaction without committing to one option. We strongly believe this to be the most prudent approach to long-term AI safety initiatives. In this paper, we have categorized impossibility theorems applicable to AI into five mechanism-based categories: deduction, indistinguishability, induction, tradeoffs, and intractability. We found that certain theorems are too specific or have implicit assumptions that limit application. Also, we added new results (theorems) such as the unfairness of explainability, the first explainability-related result in the induction category. The remaining results deal with misalignment between the clones and put a limit to the self-awareness of agents. We concluded that deductive impossibilities deny 100%-guarantees for security. In the end, we give some ideas that hold potential in explainability, controllability, value alignment, ethics, and group decision-making. They can be deepened by further investigation.
翻译:一种不可能的理论理论表明,特定的问题或一系列问题无法像索赔中所述的那样得到解决。这些理论对人工智能,特别是超智能智能智能智能的人工智能,限制了可能的操作。因此,这些结果可以作为对AI安全、AI政策和治理研究人员的指导方针、提醒和警告。这些结果可能有助于解决一些长期存在的问题,如在不承诺一个选择的情况下,在限制满意度的框架内将理论正规化,而不必承诺一种选择。我们坚信这是对AI长期安全倡议的最谨慎的方法。在本文中,我们将适用于AI的不可能的理论分为五个基于机制的类别:扣减、不易分化、诱导、交易和易易感。我们发现,某些理论过于具体,或含有限制应用的隐含假设。此外,我们增加了新的结果(理论),如解释的不公道,在诱导类别中的第一个解释性结果。剩下的结果涉及克隆之间的不相称性,对代理人的自我意识作了限制。我们的结论是,可以作出推导的不确定性,可以使安全性更深的判断。