Simulation is essential to reinforcement learning (RL) before implementation in the real world, especially for safety-critical applications like robot manipulation. Conventionally, RL agents are sensitive to the discrepancies between the simulation and the real world, known as the sim-to-real gap. The application of domain randomization, a technique used to fill this gap, is limited to the imposition of heuristic-randomized models. We investigate the properties of intrinsic stochasticity of real-time simulation (RT-IS) of off-the-shelf simulation software and its potential to improve the robustness of RL methods and the performance of domain randomization. Firstly, we conduct analytical studies to measure the correlation of RT-IS with the occupation of the computer hardware and validate its comparability with the natural stochasticity of a physical robot. Then, we apply the RT-IS feature in the training of an RL agent. The simulation and physical experiment results verify the feasibility and applicability of RT-IS to robust RL agent design for robot manipulation tasks. The RT-IS-powered robust RL agent outperforms conventional RL agents on robots with modeling uncertainties. It requires fewer heuristic randomization and achieves better generalizability than the conventional domain-randomization-powered agents. Our findings provide a new perspective on the sim-to-real problem in practical applications like robot manipulation tasks.
翻译:仿真在实现机器人操作等安全关键应用之前,对于强化学习(RL)至关重要。传统上,RL代理在模拟和现实世界之间的差异即模拟到真实(sim-to-real)鸿沟很敏感。填补此差距的技术,如域随机化,仅限于对启发式随机化模型的强加。我们研究了现成的仿真软件的实时模拟(RT-IS)的内在随机性特征及其改善RL方法的鲁棒性和域随机化性能的潜力。首先,我们进行了分析研究,测量了RT-IS与计算机硬件的占用之间的关系,并验证了其与物理机器人的自然随机性的可比性。然后,我们将RT-IS方法应用于RL代理的训练。仿真和物理实验结果验证了RT-IS对于机器人操作任务的鲁棒RL代理设计的可行性和适用性。RT-IS强化的鲁棒RL代理在建模不确定性机器人上表现优于传统RL代理。它需要更少的启发式随机化,比传统域随机化强化代理实现更好的通用性。我们的发现为实际应用(如机器人操作任务)中的模拟到现实问题提供了新的视角。