Offline reinforcement learning (Offline RL) suffers from the innate distributional shift as it cannot interact with the physical environment during training. To alleviate such limitation, state-based offline RL leverages a learned dynamics model from the logged experience and augments the predicted state transition to extend the data distribution. For exploiting such benefit also on the image-based RL, we firstly propose a generative model, S2P (State2Pixel), which synthesizes the raw pixel of the agent from its corresponding state. It enables bridging the gap between the state and the image domain in RL algorithms, and virtually exploring unseen image distribution via model-based transition in the state space. Through experiments, we confirm that our S2P-based image synthesis not only improves the image-based offline RL performance but also shows powerful generalization capability on unseen tasks.
翻译:离线强化学习( 离线 RL ) 受到内在分布变化的影响, 因为它在培训期间无法与物理环境互动。 为了减轻这种限制, 州基离线RL 利用从记录的经验中学习到的动态模型, 并增加预测的向扩展数据分布的状态过渡。 为了在基于图像的RL 上利用这种好处, 我们首先提出一个基因化模型, S2P ( State2Pixel), 它将代理器的原始像素与相应的状态合成。 它能够弥合国家与 RL 算法中图像域之间的差距, 并几乎通过基于模型的转换探索州空间的无形图像分布。 我们通过实验确认, 我们基于S2P 的图像合成不仅改善了基于图像的离线 RL 性功能, 而且还展示了对不可见的任务的强大概括能力 。