Aiming for higher-level scene understanding, this work presents a neural network approach that takes a road-layout map in bird's-eye-view as input, and predicts a human-interpretable graph that represents the road's topological layout. Our approach elevates the understanding of road layouts from pixel level to the level of graphs. To achieve this goal, an image-graph-image auto-encoder is utilized. The network is designed to learn to regress the graph representation at its auto-encoder bottleneck. This learning is self-supervised by an image reconstruction loss, without needing any external manual annotations. We create a synthetic dataset containing common road layout patterns and use it for training of the auto-encoder in addition to the real-world Argoverse dataset. By using this additional synthetic dataset, which conceptually captures human knowledge of road layouts and makes this available to the network for training, we are able to stabilize and further improve the performance of topological road layout understanding on the real-world Argoverse dataset. The evaluation shows that our approach exhibits comparable performance to a strong fully-supervised baseline.
翻译:为了更高级别的现场了解,这项工作提出了一个神经网络方法,在鸟的眼观中以路布局图作为输入,并预测一个能代表道路地形布局的人类解释图表。我们的方法提高了对道路布局的理解,从像素水平提升到图形水平。为了实现这一目标,利用了一个图像图像图像图像自动编码器。网络旨在学会在其自动编码瓶颈中将图示显示回归。这种学习是由图像重建损失自我监督的,不需要任何外部手动说明。我们创建了一个包含共同道路布局模式的合成数据集,并在真实世界Argovers数据集之外用于培训自动编码器。通过使用这一额外的合成数据集,从概念上捕捉到人类对道路布局的了解,并将这种知识提供给网络培训,我们得以稳定并进一步改进对真实世界的Argoversective数据集的地形布局理解。评估表明,我们的方法可以与一个非常强大的超强的基线相比。