In recent years, domain randomization over dynamics parameters has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies in robotic manipulation; however, finding optimal randomization distributions can be difficult. In this paper, we introduce DROPO, a novel method for estimating domain randomization distributions for safe sim-to-real transfer. Unlike prior work, DROPO only requires a limited, precollected offline dataset of trajectories, and explicitly models parameter uncertainty to match real data using a likelihood-based approach. We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodeled phenomenon. We also evaluate the method in two zero-shot sim-to-real transfer scenarios, showing successful domain transfer and improved performance over prior methods.
翻译:近年来,动态参数的域随机化获得了大量的牵引力,作为机器人操作中强化学习政策模拟到实际转让的一种方法;然而,找到最佳随机化分布可能很困难。 在本文中,我们引入了DROPO,这是用于估计域随机化分布的新方法,用于安全模拟到真实传输。 与以往的工作不同,DROPO只要求有限、预先收集的轨迹离线数据集,以及明确的模型参数不确定性,以便用基于可能性的方法匹配真实数据。 我们证明DROPO能够在模拟中恢复动态参数分布,并找到能够补偿非模型现象的分布。 我们还评估了两种零光速模拟到真实传输情景中的方法,展示了成功的域转移,并改进了以往方法的性能。