Physics simulators have shown great promise for conveniently learning reinforcement learning policies in safe, unconstrained environments. However, transferring the acquired knowledge to the real world can be challenging due to the reality gap. To this end, several methods have been recently proposed to automatically tune simulator parameters with posterior distributions given real data, for use with domain randomization at training time. These approaches have been shown to work for various robotic tasks under different settings and assumptions. Nevertheless, existing literature lacks a thorough comparison of existing adaptive domain randomization methods with respect to transfer performance and real-data efficiency. In this work, we present an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO), to shed light on which are most suitable for each setting and task at hand. We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands. The code used will be released at https://github.com/gabrieletiboni/adr-benchmark.
翻译:物理模拟器展示了在安全、不受限制的环境中方便地学习强化学习政策的巨大希望,然而,由于现实差距,将获得的知识转移到现实世界可能具有挑战性。为此,最近提出了几种方法,在培训时将模拟参数与提供真实数据的后部分布进行自动调和,在培训时与域随机化使用。这些方法被证明在不同环境和假设下适用于各种机器人任务。然而,现有文献缺乏对现有适应性域随机化方法与传输性能和真实数据效率的彻底比较。在这项工作中,我们为离线和在线方法(SimOpt、BayRpt、DROID、DROPO)提供了一个开放的基准,以阐明最适合每个设置和任务的信息。我们发现,在线方法受目前所学的下一个循环政策质量的限制,而离线方法有时在以开路指令进行模拟时会失败。所使用的代码将在 https://github.com/gabrietieletiboni/adbenchmark上发布。