强化学习的计量和连续性 (Metrics and continuity in reinforcement learning)

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.

翻译：在加强学习的最实际应用中,维持对个别国家的直接估计是站不住脚的;在连续状态系统中,这是不可能做到的。相反,研究人员往往利用国家相似性(无论是明示还是隐含)来建立能够从有限的一组样本中归纳出来的模型。因此,所使用的国家相似性概念及其产生的住区和地形学至关重要,因为这将直接影响到算法的绩效。事实上,最近的一些作品引入了假设存在“良好行为”邻里存在的算法,但将这类顶点的全部规格留待今后的工作使用。在本文中,我们引入了一种统一的形式主义,通过量度的透镜来界定这些顶点。我们在这些指标中建立等级,并展示其对说明强化学习问题的Markov决定过程的理论影响。我们用实验性评价来补充我们的理论结果,展示所考虑的尺度之间的差异。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

专知会员服务

39+阅读 · 2020年11月3日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation