Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. Recently, there has been renewed interest in bisimulation metrics as a tool to address this issue; these metrics can be used to learn representations that are, in principle, invariant to irrelevant distractions by measuring behavioural similarity between states. An accurate, unbiased, and scalable estimation of these metrics has proved elusive in continuous state and action scenarios. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states, and can be estimated without bias in continuous state and action spaces. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS), even when added on top of data augmentation techniques.
翻译:在视觉分心的情况下,从视觉输入中学习一般政策是强化学习的一个棘手问题。 最近,人们再次对作为解决这一问题的工具的强化度量标准重新感兴趣;这些度量标准可以用来学习原则上通过衡量各州的行为相似性来改变不相干分心的表达方式。 在连续的状态和行动情景中,对这些度量指标的准确、公正和可调整的估计被证明难以实现。 我们提议了缠绕的平衡度值,一种可以说明各州间距离功能的平衡度标准,并且可以在连续状态和行动空间中不加偏差地估算。 我们表明,即使在数据增强技术之外,纠缠不休的平衡性能能如何与先前的“控制控制套”(DCS)方法相比得到有意义的改善。