We draw on the latest advancements in the physics community to propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL). We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories (as little as one rollout with $\leq 30$ time steps) than state of the art model learning algorithms. Further, the technique learns a model that is accurate enough to induce near-optimal policies given significantly fewer trajectories than those required by model-free algorithms. It brings the benefits of model-based RL without requiring a model to be developed in advance, for systems that have physics-based dynamics. To establish the validity and applicability of this algorithm, we conduct experiments on four classic control tasks. We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well. Further, the learned policy performs well when deployed on the actual physical system, thus bridging the model to real system gap. We further compare our method to state-of-the-art model-based and model-free approaches, and show that our method requires fewer trajectories sampled on the true physical system compared other methods. Additionally, we explored approximate dynamics models and found that they also can perform well.
翻译:我们利用物理学界的最新进展,提出一种新的方法,以发现强化学习中物理系统的非线性动态(RL)的治理性非线性动态。我们确定,这种方法能够利用远小于艺术模型学习算法状态的轨迹(仅次于一个推出30美元30美元的时间步骤)来发现基本动态。此外,该技术学习了一种模型,该模型的准确性足以引导接近最佳的政策,其轨迹大大低于无模型算法所要求的轨迹。它带来了基于模型的RL的好处,而无需事先为具有物理动态的系统开发一个模型。为确定这种算法的有效性和适用性,我们进行了四项典型的控制任务实验。我们发现,根据所发现的基本系统动态动态状态而培训的最佳政策可以非常概括。此外,在实际物理系统上安装时,所学的政策效果良好,从而将模型与实际系统差距相连接起来。我们进一步比较了基于模型的RL的RL方法,而无需为基于物理动态动态动态的模型,并显示我们所找到的模型需要的更低的物理模型,我们所探索的其他系统也能够进行更精确的模型。