In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
翻译:在加强学习的最实际应用中,维持对个别国家的直接估计是站不住脚的;在连续状态系统中,这是不可能做到的。相反,研究人员往往利用国家相似性(无论是明示还是隐含)来建立能够从有限的一组样本中归纳出来的模型。因此,所使用的国家相似性概念及其产生的住区和地形学至关重要,因为这将直接影响到算法的绩效。事实上,最近的一些作品引入了假设存在“良好行为”邻里存在的算法,但将这类顶点的全部规格留待今后的工作使用。在本文中,我们引入了一种统一的形式主义,通过量度的透镜来界定这些顶点。我们在这些指标中建立等级,并展示其对说明强化学习问题的Markov决定过程的理论影响。我们用实验性评价来补充我们的理论结果,展示所考虑的尺度之间的差异。