Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers are skeptical, because human-like power-seeking instincts need not be present in RL agents. To clarify this debate, we develop the first formal theory of the statistical tendencies of optimal policies in reinforcement learning. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that for most prior beliefs one might have about the agent's reward function (including as a special case the situations where the reward function is known), one should expect optimal policies to seek power in these environments. These policies seek power by keeping a range of options available and, when the discount rate is sufficiently close to 1, by navigating towards larger sets of potential terminal states.
翻译:一些研究人员推测,智能强化学习(RL)代理商将受到激励,以寻求资源和权力来实现其目标。其他研究人员持怀疑态度,因为像人一样的追求权力的本能不需要出现在RL代理商中。为了澄清这一辩论,我们开发了在强化学习中最佳政策统计趋势的第一个正式理论。在Markov决策过程中,我们证明某些环境不对称足以使最佳政策倾向于寻求环境权力。这些对称存在于可以关闭或摧毁该代理商的许多环境中。我们证明,对于大多数先前的信念,人们可能对代理人的奖赏功能有怀疑(包括作为特殊情况知道奖励功能的情况),人们应该期望在这些环境中寻求权力的最佳政策。这些政策通过保持一系列的选择,以及当折扣率足够接近于1时,通过向更大的潜在终端状态航行,寻求权力。