We introduce a motion forecasting (behavior prediction) method that meets the latency requirements for autonomous driving in dense urban environments without sacrificing accuracy. A whole-scene sparse input representation allows StopNet to scale to predicting trajectories for hundreds of road agents with reliable latency. In addition to predicting trajectories, our scene encoder lends itself to predicting whole-scene probabilistic occupancy grids, a complementary output representation suitable for busy urban environments. Occupancy grids allow the AV to reason collectively about the behavior of groups of agents without processing their individual trajectories. We demonstrate the effectiveness of our sparse input representation and our model in terms of computation and accuracy over three datasets. We further show that co-training consistent trajectory and occupancy predictions improves upon state-of-the-art performance under standard metrics.
翻译:我们引入了运动预测(行为预测)方法,该方法满足了在密集城市环境中自主驾驶而不牺牲准确性的潜伏性要求。一个完整的零星输入代表让StopNet能够扩大规模,预测数百个道路物剂的轨迹和可靠的潜伏性。除了预测轨迹外,我们的现场编码器还有助于预测适合繁忙城市环境的全塞概率占用网,即适合繁忙城市环境的补充产出代表。占用网使AV能够集体解释各种物剂团体的行为,而不处理其各自的轨迹。我们展示了我们稀有的投入代表以及三个数据集的计算和准确性模型的有效性。我们进一步显示,在标准指标下,共同培训一致的轨迹和占用预测在最先进的性能方面有所改进。