与预期模型规划 (Planning with Expectation Models)

Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments, it is not obvious how expectation models can be used for planning as they only partially characterize a distribution. In this paper, we propose a sound way of using approximate expectation models for MBRL. In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

翻译：在基于模型的强化学习(MBRL)中,分布模型和样本模型是两种流行的模式选择。然而,学习这些模型可能难以解决,特别是当状态和行动空间很大时。另一方面,期望模型由于其紧凑性而相对容易学习,也广泛用于确定性环境。对于随机环境,尚不清楚如何将预期模型用于规划,因为它们只是分配的一部分特征。在本文件中,我们提出了一种合理的方式,为MBRL使用大致的预期模型。特别是,我们1 表明,如果状态值功能是直线的,那么使用预期模型进行规划就等同于使用分配模型进行规划;2 分析接近预期的两个共同的对称选择:线性和非线性预期模型;3 提出健全的基于模型的政策评价算法并展示其趋同结果;4 经验性地展示拟议规划算法的有效性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/