Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with a rasterized format and some of the more recent approaches leverage custom vectorized formats. In contrast, we propose representing the scene's information by employing Scalable Vector Graphics (SVG). SVG is a well-established format that matches the problem of trajectory prediction better than rasterized formats while being more general than arbitrary vectorized formats. SVG has the potential to provide the convenience and generality of raster-based solutions if coupled with a powerful tool such as CNNs, for which we introduce SVG-Net. SVG-Net is a Transformer-based Neural Network that can effectively capture the scene's information from SVG inputs. Thanks to the self-attention mechanism in its Transformers, SVG-Net can also adequately apprehend relations amongst the scene and the agents. We demonstrate SVG-Net's effectiveness by evaluating its performance on the publicly available Argoverse forecasting dataset. Finally, we illustrate how, by using SVG, one can benefit from datasets and advancements in other research fronts that also utilize the same input format. Our code is available at https://vita-epfl.github.io/SVGNet/.
翻译:预测车辆在现场的移动是安全自主驾驶系统的一个基本问题。 为此,对现场基础设施的理解往往是预测未来轨迹的主要线索。 多数拟议方法以光化格式和一些最新方法代表场景,并使用自定义矢量格式。 相反,我们提议使用可缩放矢量图形(SVG)代表场景信息。 SVG是一个完善的格式,它比光化格式更好地匹配轨迹预测问题,同时比光化格式更普遍。 SVG具有潜力提供光栅解决方案的方便性和普遍性,如果与诸如CNN等强大工具相结合,我们为此引入SVG-Net。 SVG-Net是一个基于变压器的神经网络,能够有效地从SVG输入中获取场景信息。由于在变压器中的自我注意机制,SVG-Net也可以充分了解场景与代理人之间的关系。 我们展示了SVG-G-Net的解决方案的方便性和普遍性,我们也可以通过SVG-VS-set的可获取性能通过SVSVS-ssss 的公开数据来评估其业绩。