As machine learning models become increasingly prevalent in motion forecasting systems for autonomous vehicles (AVs), it is critical that we ensure that model predictions are safe and reliable. However, exhaustively collecting and labeling the data necessary to fully test the long tail of rare and challenging scenarios is difficult and expensive. In this work, we construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data. Specifically, we conduct an extensive labeling effort to identify causal agents, or agents whose presence influences human driver behavior in any way, in the Waymo Open Motion Dataset (WOMD), and we use these labels to perturb the data by deleting non-causal agents from the scene. We then evaluate a diverse set of state-of-the-art deep-learning model architectures on our proposed benchmark and find that all models exhibit large shifts under perturbation. Under non-causal perturbations, we observe a $25$-$38\%$ relative change in minADE as compared to the original. We then investigate techniques to improve model robustness, including increasing the training dataset size and using targeted data augmentations that drop agents throughout training. We plan to provide the causal agent labels as an additional attribute to WOMD and release the robustness benchmarks to aid the community in building more reliable and safe deep-learning models for motion forecasting.
翻译:随着机器学习模型在自主车辆运动预测系统中越来越普遍,我们必须确保模型预测是安全和可靠的,然而,详尽地收集和标明充分测试稀有和具有挑战性的情景的长尾情况所需的数据是困难和昂贵的;在这项工作中,我们为评价和改进模型的稳健性而对现有数据采用扰动,建立了一个新的基准;具体地说,我们在Waymo公开运动数据集(Woammo Open Motion Dataset)中进行广泛的标签努力,以查明以任何形式影响人类驾驶者行为的因果剂或代理人,我们利用这些标签来渗透数据,从现场删除非因果剂;然后,我们评估一套关于我们拟议基准的各种最先进的深层次学习模型结构,发现所有模型在扰动中都有很大的变动;在非因果扰动情况下,我们观察到与原始数据相比,在速变中相对变化了25-38美元。 我们然后调查各种技术,改进模型的稳健性,包括增加培训数据设置规模,利用有针对性的数据强化型数据模型,在我们拟议的基准上,为社区提供更稳健的预测力的指数,以便进行更稳健的周期的周期的交付。