Sim-to-real transfer remains a major challenge in reinforcement learning (RL) for robotics, as policies trained in simulation often fail to generalize to the real world due to discrepancies in environment dynamics. Domain Randomization (DR) mitigates this issue by exposing the policy to a wide range of randomized dynamics during training, yet leading to a reduction in performance. While standard approaches typically train policies agnostic to these variations, we investigate whether sim-to-real transfer can be improved by conditioning the policy on an estimate of the dynamics parameters -- referred to as context. To this end, we integrate a context estimation module into a DR-based RL framework and systematically compare SOTA supervision strategies. We evaluate the resulting context-aware policies in both a canonical control benchmark and a real-world pushing task using a Franka Emika Panda robot. Results show that context-aware policies outperform the context-agnostic baseline across all settings, although the best supervision strategy depends on the task.
翻译:仿真到现实迁移仍然是机器人强化学习领域的主要挑战,由于环境动力学差异,在仿真中训练的策略往往难以泛化至真实世界。领域随机化通过让策略在训练中接触广泛随机化的动力学来缓解此问题,但会导致性能下降。传统方法通常训练对变化不敏感的策略,而本研究探讨了通过将策略条件化于动力学参数估计(称为上下文)是否能改进仿真到现实迁移。为此,我们将上下文估计模块集成至基于领域随机化的强化学习框架,并系统比较了当前最优的监督策略。通过在经典控制基准测试和Franka Emika Panda机器人真实世界推动任务中的评估,结果表明情境感知策略在所有设定下均优于情境无关基线,但最佳监督策略的选择取决于具体任务。