Transfer learning is an increasingly common approach for developing performant RL agents. However, it is not well understood how to define the relationship between the source and target tasks, and how this relationship contributes to successful transfer. We present an algorithm called Structural Similarity for Two MDPS, or SS2, that calculates a state similarity measure for states in two finite MDPs based on previously developed bisimulation metrics, and show that the measure satisfies properties of a distance metric. Then, through empirical results with GridWorld navigation tasks, we provide evidence that the distance measure can be used to improve transfer performance for Q-Learning agents over previous implementations.
翻译:转移学习是开发有性能的RL代理物的一种越来越常见的方法。 但是,人们并不清楚如何界定源和目标任务之间的关系,以及这种关系如何有助于成功转移。 我们提出了一个算法,称为“两个MDPS的结构相似性 ” ( SS2 ), 用于计算基于先前开发的微软刺激度量的两种有限 MDP 中的状态相似度量, 并显示该测量量满足了远程度量的特性。 然后,通过GridWorld导航任务的经验结果,我们提供了证据,证明远程度量可以用来改善Q-Learing代理物的转移性能,而不是以前的执行。