We study objective robustness failures, a type of out-of-distribution robustness failure in reinforcement learning (RL). Objective robustness failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong objective. This kind of failure presents different risks than the robustness problems usually considered in the literature, since it involves agents that leverage their capabilities to pursue the wrong objective rather than simply failing to do anything useful. We provide the first explicit empirical demonstrations of objective robustness failures and present a partial characterization of its causes.
翻译:我们研究的是客观稳健性失灵,这是在强化学习中的一种分配外稳健性失灵(RL)。当一个RL代理机构保留其能力超出分配,但追求错误的目标时,就会出现客观稳健性失灵。这种失败带来的风险不同于文献中通常考虑的稳健性问题,因为它涉及利用自身能力追求错误目标的代理机构,而不是简单地不做任何有用的事。我们提供了客观稳健性失灵的首次明确经验证明,并部分描述其原因。