Differentiable simulators represent an environment's dynamics as a differentiable function. Within robotics and autonomous driving, this property is used in Analytic Policy Gradients (APG), which relies on backpropagating through the dynamics to train accurate policies for diverse tasks. Here we show that differentiable simulation also has an important role in world modeling, where it can impart predictive, prescriptive, and counterfactual capabilities to an agent. Specifically, we design three novel task setups in which the differentiable dynamics are combined within an end-to-end computation graph not with a policy, but a state predictor. This allows us to learn relative odometry, optimal planners, and optimal inverse states. We collectively call these predictors Analytic World Models (AWMs) and demonstrate how differentiable simulation enables their efficient, end-to-end learning. In autonomous driving scenarios, they have broad applicability and can augment an agent's decision-making beyond reactive control.
翻译:可微分模拟器将环境动力学表示为可微分函数。在机器人学和自动驾驶领域,这一特性被应用于解析策略梯度(APG),该方法通过动力学反向传播来训练适用于多样化任务的精确策略。本文表明,可微分模拟在世界建模中同样具有重要作用,能够赋予智能体预测性、规范性和反事实推理能力。具体而言,我们设计了三种新颖的任务框架,其中可微分动力学被整合进端到端计算图中,但并非与策略结合,而是与状态预测器结合。这使得我们能够学习相对里程计、最优规划器及最优逆状态。我们将这些预测器统称为解析世界模型(AWM),并展示可微分模拟如何实现其高效的端到端学习。在自动驾驶场景中,这些模型具有广泛适用性,能够增强智能体超越反应式控制的决策能力。