Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.
翻译:仿真到现实(Sim2Real)旨在高保真仿真环境中训练策略,并将其有效迁移至现实世界。尽管精确仿真器与Sim2Real强化学习方法不断发展,但纯仿真训练的策略部署于现实环境时,常出现显著的性能下降。这一下降现象被称为仿真到现实性能差距。现有Sim2Real强化学习方法通过优化仿真器精度与可变性作为现实世界性能的代理指标。然而,文献中理论与实证研究均表明,这些指标与策略的现实世界性能未必相关。针对此问题,我们提出一种基于现实性能直接调整仿真器参数的新框架。我们将该问题构建为双层强化学习框架:内层强化学习在纯仿真环境中训练策略,外层强化学习则通过调整仿真模型与仿真内奖励参数,以最大化仿真策略在现实世界中的性能。我们推导并借助简单算例验证了开发双层强化学习算法所需的数学工具,以实现仿真到现实性能差距的弥合。