Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. In such problems, one often needs to optimize an objective function subject to a set of constraints, including for instance constraints on the entropy of a policy or to restrict the possible set of actions or states accessible to an agent. Convergence guarantees for evolutionary strategies to optimize stochastic constrained problems are however lacking in the literature. In this work, we address this problem by designing a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints.
翻译:最近已经证明,进化战略是为了在强化学习的复杂优化问题方面实现相互竞争的绩效水平,在这些问题中,人们往往需要优化受一系列制约的客观功能,包括限制政策的变种或限制可能的行动或代理人可以利用的国家。然而,文献中缺乏对进化战略的一致保证,以优化随机制约问题。在这项工作中,我们通过设计新的优化算法来解决这一问题,该算法具有充分的减缩机制,确保趋同,并且仅以对功能的估计为基础。我们证明这种算法适用于两类实验:(一) 最大限度增加奖励的控制任务,和(二) 在不放松一系列限制的情况下最大限度地增加奖励。