Developing robot controllers in a simulated environment is advantageous but transferring the controllers to the target environment presents challenges, often referred to as the "sim-to-real gap". We present a method for continuous improvement of modeling and control after deploying the robot to a dynamically-changing target environment. We develop a differentiable physics simulation framework that performs online system identification and optimal control simultaneously, using the incoming observations from the target environment in real time. To ensure robust system identification against noisy observations, we devise an algorithm to assess the confidence of our estimated parameters, using numerical analysis of the dynamic equations. To ensure real-time optimal control, we adaptively schedule the optimization window in the future so that the optimized actions can be replenished faster than they are consumed, while staying as up-to-date with new sensor information as possible. The constant re-planning based on a constantly improved model allows the robot to swiftly adapt to the changing environment and utilize real-world data in the most sample-efficient way. Thanks to a fast differentiable physics simulator, the optimization for both system identification and control can be solved efficiently for robots operating in real time. We demonstrate our method on a set of examples in simulation and show that our results are favorable compared to baseline methods.
翻译:在模拟环境中开发机器人控制器是有利的,但在模拟环境中开发机器人控制器是有利的,但将控制器转移到目标环境会带来挑战,通常被称为“模拟到现实的差距”。我们提出在将机器人部署到动态变化的目标环境后不断改进模型和控制的方法。我们开发了一个不同的物理模拟框架,利用来自目标环境的实时观测,同时进行在线系统识别和最佳控制。为了确保系统对噪音观测进行强有力的识别,我们设计了一种算法,用以评估我们估计参数的信心,使用对动态方程式的数值分析。为了确保实时最佳控制,我们调整未来优化窗口的时间,以便优化行动能够比消耗的速度更快地得到补充,同时尽可能更新新的传感器信息。基于不断改进的模型不断进行重新规划,使机器人能够迅速适应变化的环境,并以最高效的方式利用真实世界数据。由于快速不同的物理模拟器,因此对系统识别和控制的优化可以有效地解决机器人在实时操作。我们用模型展示了我们最有利的模型的模型,我们在模拟中展示了我们所设定的模型。