We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.
翻译:在强化学习框架中,我们提出了一种使用风格传输和对抗性学习的学习风格 -- -- 不可知性代表的新方法。 风格在这里指的是任务相关细节,如图像背景的颜色,在图像中,以不同风格对不同环境的学习政策进行总体化仍是一项挑战。 我们的方法侧重于学习风格 -- -- 不可知性代表,用内在的对立风格扰动生成器产生的不同图像风格来培训演员,该功能在演员和生成器之间玩微模轴游戏,而无需为数据增强或为对抗性培训增加类标签要求专家知识。 我们核实我们的方法取得了比最先进的Procgen和易碎化控制套件基准方法更具有竞争力或更好的性能,并进一步调查从我们模型中提取的特征,表明模型更好地捕捉了异性,并且较少被变换样式分心。 代码可在 https://github.com/POSTE-CVLab/stty-agnostic-RL上查阅。