This work explores scene graphs as a distilled representation of high-level information for autonomous driving, applied to future driver-action prediction. Given the scarcity and strong imbalance of data samples, we propose a self-supervision pipeline to infer representative and well-separated embeddings. Key aspects are interpretability and explainability; as such, we embed in our architecture attention mechanisms that can create spatial and temporal heatmaps on the scene graphs. We evaluate our system on the ROAD dataset against a fully-supervised approach, showing the superiority of our training regime.
翻译:这项工作探索了景象图,作为用于自主驾驶的高层次信息的蒸馏形式,适用于今后的驱动器行动预测。鉴于数据样本稀缺和严重不平衡,我们提议建立一个自监督管道,以推断有代表性和分离的嵌入。关键方面是可解释性和可解释性;因此,我们在我们的建筑结构中嵌入能够创造空间和时间热测图的注意机制。我们用完全受监督的方法评估我们的ROAD数据集系统,以显示我们培训制度的优越性。