Humans have a remarkable ability to make decisions by accurately reasoning about future events, including the future behaviors and states of mind of other agents. Consider driving a car through a busy intersection: it is necessary to reason about the physics of the vehicle, the intentions of other drivers, and their beliefs about your own intentions. If you signal a turn, another driver might yield to you, or if you enter the passing lane, another driver might decelerate to give you room to merge in front. Competent drivers must plan how they can safely react to a variety of potential future behaviors of other agents before they make their next move. This requires contingency planning: explicitly planning a set of conditional actions that depend on the stochastic outcome of future events. In this work, we develop a general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations and low-dimensional behavioral observations. We use a conditional autoregressive flow model to create a compact contingency planning space, and show how this model can tractably learn contingencies from behavioral observations. We developed a closed-loop control benchmark of realistic multi-agent scenarios in a driving simulator (CARLA), on which we compare our method to various noncontingent methods that reason about multi-agent future behavior, including several state-of-the-art deep learning-based planning approaches. We illustrate that these noncontingent planning methods fundamentally fail on this benchmark, and find that our deep contingency planning method achieves significantly superior performance. Code to run our benchmark and reproduce our results is available at https://sites.google.com/view/contingency-planning
翻译:人类通过准确推理未来事件,包括未来行为和其他代理人的思维状态来做出决策的非凡能力。 考虑驾驶一辆汽车通过繁忙的十字路口: 有必要解释车辆的物理、 其他驾驶者的意图以及他们对自身意图的信念。 如果您发出一个信号, 另一个驾驶者可能会向您屈服, 或者如果你进入过道, 另一个驾驶者可能会减速, 让您有空间在前面进行整合。 胜任的驾驶者必须规划他们如何能够安全地应对其他代理人的各种潜在未来行为。 这需要应急计划: 明确规划一套取决于未来事件的随机结果的有条件行动。 在此工作中, 我们开发一个通用的应急计划设计师, 用高空场景观察和低度行为观察来学习端对端。 我们使用一个有条件的自动递增流模式来创建一个紧凑的应急规划空间, 并展示这个模型如何在行为观察之前能够轻易地了解各种潜在的未来行为的突发事件 。 我们制定了一套不现实的多试设想的封闭控制基准, 在驱动性规划方法中, 将我们的方法与多种递校程方法进行比较。