Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.
翻译:用于解决图像到图像翻译的无模型深度强化学习模式是困难的,因为它涉及高维连续状态和行动空间。 在本文中,我们从最近为挑战持续控制问题而设计的最大增生强化学习框架的成功中汲取灵感,该框架旨在挑战持续控制问题,以同时制定高维连续空间的随机政策,包括图像代表、生成和控制。 这种方法的核心是斯托切斯托克操作者-执行者-批评者(SAEC ) ( Stopchacistic Actor-Exector-Critic (SAEC SAEC ), 它是一个非政策性行为者-critic 模式, 并增加了一个执行者, 以产生现实的图像。 具体而言, 该行为方侧重于通过随机潜在行动实现高层代表和控制政策, 并明确指示执行者产生低层次的行动来操纵状态。 在面临高维持续空间问题时, 几个图像到图像翻译任务实验显示了拟议的SAEC (SAEC) 的有效性和稳健性。