In this paper, we propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS), i.e., systems where the state can not be directly measured, but must be estimated through proper state observers. The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo approach to update the policy parameters. W.r.t. previous GP-based MBRL algorithms, MC-PILCO4PMS models explicitly the presence of state observers during policy optimization, allowing to deal PMS. The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
翻译:在本文中,我们提出了一个基于模型的半计量系统强化学习算法(MBRL)模型(MBRL)算法(MBRL),即国家无法直接测量的系统,但必须通过适当的国家观察家来估算。 拟议的算法(Monte Carlo Probsitical Invituations for Learning Control for Septimediable Systems (MC-PILCO4PMS)),依靠高山工艺(GPs)来模拟系统动态,并采用蒙特卡洛方法更新政策参数。 W.r.t.以前基于GP MBRL算法的模型,MC-PILCO4PMS明确了在政策优化期间国家观察者的存在,允许处理PMS。 拟议的算法的有效性已经在模拟和两个实际系统中进行了测试。