It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.
翻译:对于深度强化学习(DRL)算法来说,将其学到的策略转移到具有不同视觉输入的新环境非常重要。本文提出Prompt based Proximal Policy Optimization ($P^{3}O$),一种基于提示的三阶段DRL算法,通过应用提示方法从目标环境向源环境转移视觉表示。$P^{3}O}$ 的过程包括三个阶段:预训练,提示和预测。特别地,我们指定了一个提示转换器进行表示转换,并提出了一个两步训练过程,对目标环境的提示转换器进行训练,而DRL的其余流程保持不变。我们实施了$P^{3}O}$并在OpenAI CarRacing视频游戏上进行了评估。实验结果表明,$P^{3}O$优于最先进的视觉转移方案。特别地,$P^{3}O$允许学到的策略在具有不同视觉输入的环境中表现良好,比在这些环境中重新训练策略要有效得多。