Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers are skeptical, because RL agents need not have human-like power-seeking instincts. To clarify this debate, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
翻译:一些研究人员推测,智能强化学习(RL)代理商将受到激励,以寻求资源和权力来实现其目标。其他研究人员则持怀疑态度,因为RL代理商不需要像人类那样的追求权力的本能。为了澄清这场辩论,我们开发了最佳政策统计趋势的第一个正式理论。在Markov决策过程中,我们证明某些环境不对称足以使最佳政策倾向于寻求对环境的权力。这些对称存在于可以关闭或摧毁该代理商的许多环境中。我们证明,在这些环境中,大多数奖励功能都通过保持一系列的选择,并在尽可能扩大平均回报时,通过探索更多潜在的终点状态,使寻求权力成为最佳。