When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.
翻译:当AI代理商不使其行动与人类价值相一致时,它们可能会造成严重伤害。解决价值调整问题的一个办法就是将监测该代理商所有行动的操作员包括在内。尽管事实上,这一解决方案保证了最大安全,但效率非常低,因为它要求人类运营商将所有注意力都投向该代理商。在本文件中,我们提出了一个效率更高的解决方案,允许运营商在不忽视其监测任务的情况下从事其他活动。在我们的做法中,AI代理商只要求运营商允许采取关键行动,即潜在的有害行动。我们引入了与AI安全相关的关键行动概念,并讨论如何建立一个衡量行动重要性的模式。我们还讨论了如何利用运营商的反馈使该代理商更加聪明。