Distracted drivers are dangerous drivers. Equipping advanced driver assistance systems (ADAS) with the ability to detect driver distraction can help prevent accidents and improve driver safety. In order to detect driver distraction, an ADAS must be able to monitor their visual attention. We propose a model that takes as input a patch of the driver's face along with a crop of the eye-region and classifies their glance into 6 coarse regions-of-interest (ROIs) in the vehicle. We demonstrate that an hourglass network, trained with an additional reconstruction loss, allows the model to learn stronger contextual feature representations than a traditional encoder-only classification module. To make the system robust to subject-specific variations in appearance and behavior, we design a personalized hourglass model tuned with an auxiliary input representing the driver's baseline glance behavior. Finally, we present a weakly supervised multi-domain training regimen that enables the hourglass to jointly learn representations from different domains (varying in camera type, angle), utilizing unlabeled samples and thereby reducing annotation cost.
翻译:设置先进的驾驶员协助系统(ADAS),使其有能力检测驾驶员分心,有助于防止事故,提高驾驶员安全性。为了检测驾驶员分心,ADAS必须能够监测其视觉关注度。我们提出了一个模型,将驾驶员脸部的补丁与眼区域的一个作物一起作为输入,并将他们的目光分解为车辆中6个粗略感兴趣的区域(ROIs)。我们证明,一个经过额外重建损失培训的沙漏网络,使该模型能够学习比传统的只使用编码器的分类模块更强的背景特征。为了使该系统能够适应特定外观和行为的变化,我们设计了一个个性化的眼镜模型,配有代表驾驶员基线目光动作的辅助输入。最后,我们提出一个监督不力的多部培训制度,使沙漏能够联合从不同领域(摄像类型、角度)学习演示,使用无标签样本,从而降低标记成本。