Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group $\mathrm{SE(3)}$, the discrete-time translation group $\mathbb{Z}$, and the object permutation group $\mathrm{S}_n$. EDGI follows the Diffuser framework (Janner et al. 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new $\mathrm{SE(3)} \times \mathbb{Z} \times \mathrm{S}_n$-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier-based guidance allow us to softly break the symmetry for specific tasks as needed. On navigation and object manipulation tasks, EDGI improves sample efficiency and generalization.
翻译:身体智能体在一个结构化的世界中运作,通常解决具有空间,时间和置换对称性的任务。大多数用于规划和基于模型的强化学习(MBRL)的算法并未考虑到这种丰富的几何结构,从而导致样本低效和泛化能力差。我们引入 Equivariant Diffuser for Generating Interactions(EDGI),这是一种关于空间群组 SE(3),离散时间平移群 Z 和对象置换群 S_n 的积等变的MBRL和规划算法。EDGI遵循 Diffuser 框架(Janner et al. 2022),以将学习世界模型和在其中规划视为一个条件生成建模问题,并在转换轨迹数据集上训练扩散模型。我们引入了一种新的 SE(3)×Z×S_n -等变扩散模型,支持多重表示。我们将这个模型集成到规划循环中,在特定任务中,条件和基于分类器的指导能够帮助我们软性打破对称性。在导航和物体操纵任务中,EDGI 提高了样本效率和泛化能力。