We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.
翻译:我们介绍MEM:为处理复杂的视觉控制任务而实现多视探索最大化。据我们所知,MEM是将多视代表性学习和内在奖励驱动的探索结合到强化学习(RL)的第一种方法。更具体地说,MEM首先提取多视观测的具体和共享信息,以形成高质量的特征,然后就所学特征进行RL,使代理方能够充分理解环境并产生更好的行动。此外,MEM将多视特征转化为基于最小最大化的内在回报,以鼓励探索。因此,MEM可以显著提高RL代理的样本效率和一般化能力,促进用高度观测和备用空间解决现实世界问题。我们评估了DeepMind控制套件和Procgen游戏的各项任务。广泛的模拟结果表明,MEM能够取得优异的性能,并且比基准计划更简单,效率更高。