Some researchers have speculated that capable reinforcement learning agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: human power-seeking instincts seem idiosyncratic, and these urges need not be present in reinforcement learning agents. We formalize a notion of power within the context of Markov decision processes. With respect to a class of neutral reward function distributions, we provide sufficient conditions for when optimal policies tend to seek power over the environment.
翻译:一些研究人员推测,有能力的强化学习机构往往受到激励,为实现其目标而寻求资源和权力。在寻求权力以优化错误确定的目标的同时,可能受到激励,以不可取的方式行事,包括合理防止停用和纠正。 另一些研究人员则表示怀疑:寻求权力的人的本能似乎具有特异性,这些冲动不必存在于强化学习机构中。我们在马尔科夫决策程序的范围内正式确定了权力概念。 关于中性奖励功能的分配,我们为最佳政策倾向于寻求环境权力时提供了充分的条件。