We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO, specifically designed for application to mechanical systems where velocities cannot be directly measured. This circumstance, if not adequately considered, can compromise the success of MBRL approaches. To cope with this problem, we define a velocity-free state formulation which consists of the collection of past positions and inputs. Then, VF-MC-PILCO uses Gaussian Process Regression to model the dynamics of the velocity-free state and optimizes the control policy through a particle-based policy gradient approach. We compare VF-MC-PILCO with our previous MBRL algorithm, MC-PILCO4PMS, which handles the lack of direct velocity measurements by modeling the presence of velocity estimators. Results on both simulated (cart-pole and UR5 robot) and real mechanical systems (Furuta pendulum and a ball-and-plate rig) show that the two algorithms achieve similar results. Conveniently, VF-MC-PILCO does not require the design and implementation of state estimators, which can be a challenging and time-consuming activity to be performed by an expert user.
翻译:我们提出一个名为VF-MC-PILCO的模型强化学习算法(MBRL),专门设计用于无法直接测量速度的机械系统。如果没有充分考虑,这种情况会损害MBRL方法的成功。为了解决这个问题,我们定义了一个无速度状态配方,其中包括收集过去的位置和投入。然后,VF-MC-PILCO使用高斯进程回归模型来模拟无速度状态的动态,并通过基于粒子的政策梯度方法优化控制政策政策。我们将VF-MC-PILCO与我们以前的MBRL算法(MC-PILCO4PMS)进行比较,后者处理缺乏直接速度测量的问题,办法是模拟速度测算器的存在。模拟的(cart-pole和UR5机器人)和真正的机械系统(Furuta pentulum 和 ball-plag roit)的结果显示两种算法都取得了类似的结果。我们比较容易,VF-MC-MC-PIL-PIMS-PICO的算法需要一个具有挑战性的专家活动来进行设计和操作。