Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting "where" human drivers look at and lack knowledge of "what" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.
翻译:人类驱动器使用其关注机制来关注关键物体,在驾驶时作出决定。由于视觉数据可以显示人类的注意力,近年来,为了让自主驱动技术受益,获取和分析视觉信息已经出现。这方面的以往工作主要旨在预测“哪里”人类驱动器对“什么”物体驱动器的观察,并缺乏对“什么”物体驱动器的了解。我们的工作弥合了像素水平与目标水平关注预测之间的差距。具体地说,我们提议将关注预测模块纳入预先训练的物体探测框架,并预测基于网格的注意方式。此外,关键物体是根据预测的从旁观到外观领域得到承认的。我们用两个驱动器关注数据集,即BDD-A和DR(眼)VE评估了我们拟议的方法。我们的框架在像素水平和对象水平的注意预测中取得了有竞争力的、最先进的业绩,但在计算中效率要高得多(75.3GFLOPs 更少 )。