This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.
翻译:本文介绍了用于深层强化学习的 MDP 共质网络。 MDP 共质网络是神经网络,在MDP 联合州-州-行动空间的对称下是等式的。 目前深层强化学习的方法通常不会利用关于这种结构的知识。通过将这种先前的知识纳入政策和价值网络,我们可以使用等式限制来缩小解决方案空间的大小。我们特别侧重于群体结构的对称(可视化变换)。此外,我们引入了一种在数字上构建等式网络层的简单方法,因此系统设计者不需要像通常那样用手来解决制约。我们在一组反省或旋转下构建了 MDP 共性 MLP 和CNN 。我们显示,这些网络比CartPole、网格世界和邦的无结构基线要快。