以模型为基础的物理综合强化学习 (Physics-Informed Model-Based Reinforcement Learning)

We apply reinforcement learning (RL) to robotics. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, use it to generate imaginary trajectories and backpropagate through them to update the policy, exploiting the differentiability of the model. Intuitively, learning more accurate models should lead to better performance. Recently, there has been growing interest in developing better deep neural network based dynamics models for physical systems, through better inductive biases. We focus on robotic systems undergoing rigid body motion. We compare two versions of our model-based RL algorithm, one which uses a standard deep neural network based dynamics model and the other which uses a much more accurate, physics-informed neural network based dynamics model. We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions. In these environments, the physics-informed version of our algorithm achieves significantly better average-return and sample efficiency. In environments that are not sensitive to initial conditions, both versions of our algorithm achieve similar average-return, while the physics-informed version achieves better sample efficiency. We measure the sensitivity to initial conditions using the finite-time maximal Lyapunov exponent. We also show that, in challenging environments, where we need a lot of samples to learn, physics-informed model-based RL can achieve better average-return than state-of-the-art model-free RL algorithms such as Soft Actor-Critic, by generating accurate imaginary data.

翻译：我们对机器人应用强化学习(RL) 。传统的RL算法的一个缺点是它们的样本效率差。提高样本效率的方法之一是基于模型的RL 。在基于模型的RL算法中,我们学习了一种环境模型,用它来生成假造轨迹和反演化,通过它们来更新政策,利用模型的不同性能。直觉中,学习更准确的模型应该导致更好的性能。最近,人们越来越有兴趣为物理系统开发更深的基于神经网络的动态模型,通过更好的感知偏向性偏向性。我们侧重于机器人系统,正在经历僵硬的身体运动。我们比较了基于模型的RL 算法的两个版本, 使用标准的深层神经网络动态模型模型来生成一个更准确的神经网络模型。我们显示,在基于模型的RL, 模型的准确性模型模型主要在对初始条件敏感的环境中。在这些环境中,我们基于物理学的模型的版本可以大大改进平均的回报和取样效率。在这种环境中,我们通过不敏感的物理的初始变现, 能够实现具有更精确的精确性的数据序列的模型, 。我们的模型在这样的物理的模型中,我们可以实现更精确的原始的模型中, 我们的模型中, 以更精确的变现的模型的模型, 既可以实现更精确的物理的模型, 以更精确的模型的精确的模型的模型, 以更精确的精确的模型。