Driver distraction detection is an important computer vision problem that can play a crucial role in enhancing traffic safety and reducing traffic accidents. In this paper, a Vision Transformer (ViT) based approach for driver distraction detection is proposed. Specifically, a multi-modal Vision Transformer (ViT-DD) is developed, which exploits inductive information contained in signals of distraction detection as well as driver emotion recognition. Further, a semi-surprised learning algorithm is designed to include driver data without emotion labels into the supervised multi-task training of ViT-DD. Extensive experiments conducted on the SFDDD and AUCDD datasets demonstrate that the proposed ViT-DD outperforms the state-of-the-art approaches for driver distraction detection by 6.5% and 0.9%, respectively. Our source code is released at https://github.com/PurdueDigitalTwin/ViT-DD.
翻译:驱动器分心探测是一个重要的计算机视觉问题,在加强交通安全和减少交通事故方面可以发挥关键作用。本文提出了基于愿景变换器(VIT)的驱动器分心探测方法。具体地说,开发了多式愿景变换器(VIT-DDD),利用分心探测信号和驱动器情绪识别识别中所含的感应信息。此外,半突变学习算法旨在将没有情感标签的驱动数据纳入VIT-DD的多任务监督培训中。在SFDDD和AUCDDD数据集上进行的广泛实验表明,拟议的VIT-DD(VIT-DD)比最新的驱动器分心检测方法分别高出6.5%和0.9%。我们的源代码在https://github.com/Purdue Digitalwin/ViT-DD中发布。