AI systems often rely on two key components: a specified goal or reward function and an optimization algorithm to compute the optimal behavior for that goal. This approach is intended to provide value for a principal: the user on whose behalf the agent acts. The objectives given to these agents often refer to a partial specification of the principal's goals. We consider the cost of this incompleteness by analyzing a model of a principal and an agent in a resource constrained world where the $L$ attributes of the state correspond to different sources of utility for the principal. We assume that the reward function given to the agent only has support on $J < L$ attributes. The contributions of our paper are as follows: 1) we propose a novel model of an incomplete principal-agent problem from artificial intelligence; 2) we provide necessary and sufficient conditions under which indefinitely optimizing for any incomplete proxy objective leads to arbitrarily low overall utility; and 3) we show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions. The results in this paper argue that we should view the design of reward functions as an interactive and dynamic process and identifies a theoretical scenario where some degree of interactivity is desirable.
翻译:AI系统往往依赖两个关键组成部分:一个特定的目标或奖赏功能和一个计算该目标最佳行为的优化算法。这个方法旨在为一位校长提供价值:代理行为所代表的用户。赋予这些代理者的目标往往是指对主要目标的部分规格。我们通过分析一个在资源紧张的世界中一位委托人和一个代理者的模型来考虑这种不完整的代价,在这种世界中,国家的美元属性与本金的不同效用来源相对应。我们假定给予代理者的奖赏功能只得到$J < L$属性的支持。我们的文件贡献如下:1)我们提出了一个关于人工智能中不完全的主要代理问题的新模式;2)我们提供了必要和充分的条件,在这种条件下,为任何不完整的代理目标无限期地优化导致任意降低总体效用;3)我们表明如何修改设置以允许提及整个状态或允许委托人随时更新代理目标的奖赏功能,从而导致更高的效用解决方案。本文中的结果表明,我们应该将奖励职能的设计视为一个互动和动态过程,并找出某种理想的理论假设,在这种假设中,某种程度的相互作用是可取的。