AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI's side effects: Agents must weigh off "how much of a mess they make" with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process (POMDP) representing its uncertainty about the objective function it should optimize. We consider the setting where the true objective is revealed to the agent at a later time step. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks. We empirically demonstrate the reasonableness of our problem formalization via ground-truth evaluation in two gridworld environments.
翻译:AI 目标往往很难具体化。 某些方法通过将AI的副作用规范化来解决这个问题: 代理必须权衡“ 多少他们制造的混乱” 与一个不完全指定的代理目标。 我们提出了一个通过援助游戏框架实现副作用正规化的正式标准 。 在这些游戏中, 代理解决了一个部分可见的Markov 决策程序( POMDP), 表明它对于它应该优化的目标功能的不确定性 。 我们考虑在稍后阶段向代理披露真实目标的设置 。 我们显示, POMDP 是通过将代理报酬与代理完成一系列未来任务的能力进行交换来解决的。 我们通过两个网格世界环境中的地面真相评估, 实证了我们问题正规化的合理性 。