Model-based reinforcement learning usually suffers from a high sample complexity in training the world model, especially for the environments with complex dynamics. To make the training for general physical environments more efficient, we introduce Hamiltonian canonical ordinary differential equations into the learning process, which inspires a novel model of neural ordinary differential auto-encoder (NODA). NODA can model the physical world by nature and is flexible to impose Hamiltonian mechanics (e.g., the dimension of the physical equations) which can further accelerate training of the environment models. It can consequentially empower an RL agent with the robust extrapolation using a small amount of samples as well as the guarantee on the physical plausibility. Theoretically, we prove that NODA has uniform bounds for multi-step transition errors and value errors under certain conditions. Extensive experiments show that NODA can learn the environment dynamics effectively with a high sample efficiency, making it possible to facilitate reinforcement learning agents at the early stage.
翻译:以模型为基础的强化学习通常在培训世界模型时具有很高的样本复杂性,特别是对于具有复杂动态的环境而言。为了提高一般物理环境培训的效率,我们在学习过程中引入汉密尔顿语普通卡门式普通差分方程式,这鼓励了神经普通普通差分自动编码器的新模式。国家官方发展援助可以自然地模拟物理世界,灵活地强制实行汉密尔顿力学(例如物理方程式的尺寸),可以进一步加速环境模型的培训。因此,利用少量的样本和对物理可行性的保证,可以赋予RL代理强大的外推法能力。理论上,我们证明国家官方发展援助在某些条件下具有多步过渡误差和价值误差的统一界限。广泛的实验表明,国家官方发展援助能够以高采样效率有效地学习环境动态(例如物理方程式的尺寸),从而有可能促进早期强化学习剂。