Knowledge of the symmetries of reinforcement learning (RL) systems can be used to create compressed and semantically meaningful representations of a low-level state space. We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. Our method generates candidate symmetries and trains a recurrent neural network (RNN) to discriminate between the original trajectories and the transformed trajectories for each candidate symmetry. The RNN discriminator's accuracy for each candidate reveals how symmetric the system is under that transformation. This information can be used to create high-level representations that are invariant to all symmetries on a dataset level and to communicate properties of the RL behavior to users. We show in experiments on two simulated RL use cases (a pusher robot and a UAV flying in wind) that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
翻译:对强化学习系统(RL)的对称性知识可用于创建低层状态空间的压缩和具有内涵意义的表达方式。 我们提出一种方法,可以直接从原始轨迹数据中自动检测RL的对称性,而不需要对系统进行主动控制。 我们的方法产生候选的对称性, 并训练一个经常性神经网络(RNN), 以区分每个候选人的原始轨迹和变形轨迹。 RNN 区分器对每个候选人的准确性, 揭示了系统是如何在这种转换中对称的。 这种信息可以用来创建高层次的对称性, 对数据集级别上的所有对称性不具有差异性, 并向用户传递RL行为的性质。 我们在两个模拟RL使用的案例( 推推机器人和风中飞的UAV) 实验中显示, 我们的方法可以确定环境物理和经过培训的RL政策背后的对称性。