It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.
翻译:对于深度强化学习(DRL)算法而言,将其学到的策略转移到具有不同视觉输入的新环境中是非常重要的。本文提出了Prompt based Proximal Policy Optimization ($P^{3}O$),这是一种三阶段DRL算法,通过应用提示将视觉表示从目标环境转移到源环境。$P^{3}O}$的过程包括三个阶段:预训练、提示和预测。特别的,我们为表示转换指定了一个提示转换器,并提出了一个两步训练过程,为目标环境训练提示转换器,而DRL的其余部分保持不变。我们实现了$P^{3}O$,并在OpenAI CarRacing视频游戏上进行了评估。实验结果表明$P^{3}O$优于最先进的视觉迁移方案。特别地,$P^{3}O$允许学习的策略在具有不同视觉输入的环境中表现良好,这比在这些环境中重新训练策略要有效得多。