Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
翻译:一些研究人员推测,智能强化学习(RL)代理商将受到激励,为实现其目标而寻求资源和权力。其他研究人员指出,RL代理商不需要像人一样追求权力的本能。为了澄清这一讨论,我们开发了最佳政策统计趋势的第一个正式理论。在Markov决策过程中,我们证明某些环境不对称足以使最佳政策倾向于寻求环境权力。这些对称存在于许多环境中,可以关闭或摧毁该代理商。我们证明,在这些环境中,大多数奖励功能都通过保持一系列的选择,并在尽可能扩大平均回报时,通过探索更多可能的终点国家来优化寻找权力。