How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI assistance and the AI shutdown games. The AI assistance problem concerns designing an AI agent that helps a human to maximise their utility function(s). However, only the human knows these function(s); the AI assistant must learn them. The shutdown problem instead concerns designing AI agents that: shut down when a shutdown button is pressed; neither try to prevent nor cause the pressing of the shutdown button; and otherwise accomplish their task competently. In this paper, we show that addressing these challenges requires AI agents that can reason under uncertainty and handle both incomplete and non-Archimedean preferences.
翻译:如何确保人工智能系统与人类价值观保持一致并保持安全?我们可以通过人工智能辅助与人工智能关机博弈的框架来研究这一问题。人工智能辅助问题涉及设计能够帮助人类最大化其效用函数的人工智能代理。然而,只有人类知晓这些函数;人工智能助手必须学习它们。而关机问题则涉及设计满足以下条件的人工智能代理:当关机按钮被按下时能够关机;既不试图阻止也不引发关机按钮的按下;在其他情况下能够胜任地完成任务。本文中,我们证明解决这些挑战需要人工智能代理具备在不确定性下进行推理的能力,并能处理不完全偏好与非阿基米德偏好。