Learned representations of dynamical systems reduce dimensionality, potentially supporting downstream reinforcement learning (RL). However, no established methods predict a representation's suitability for control and evaluation is largely done via downstream RL performance, slowing representation design. Towards a principled evaluation of representations for control, we consider the relationship between the true state and the corresponding representations, proposing that ideally each representation corresponds to a unique true state. This motivates two metrics: temporal smoothness and high mutual information between true state/representation. These metrics are related to established representation objectives, and studied on Lagrangian systems where true state, information requirements, and statistical properties of the state can be formalized for a broad class of systems. These metrics are shown to predict reinforcement learning performance in a simulated peg-in-hole task when comparing variants of autoencoder-based representations.
翻译:动态系统的学习表现会降低维度,可能支持下游强化学习(RL)。然而,没有既定方法预测代表是否适合控制和评价,主要是通过下游RL表现,减缓代表性设计。为了对控控代表进行有原则的评估,我们考虑了真实状态和相应代表之间的关系,建议每个代表最理想地对应一个独特的真实状态。这提出了两个衡量标准:时间平稳和真实状态/代表性之间的高度相互信息。这些衡量标准与既定代表目标有关,并研究了拉格朗加亚系统,在这些系统中,国家的真实状态、信息要求和统计特性可以正式确定适用于广泛的系统类别。这些衡量标准显示,在比较基于自动编码的表达方式的变体时,在模拟嵌入洞任务中可以预测强化学习业绩。