Driver distraction detection is an important computer vision problem that can play a crucial role in enhancing traffic safety and reducing traffic accidents. This paper proposes a novel semi-supervised method for detecting driver distractions based on Vision Transformer (ViT). Specifically, a multi-modal Vision Transformer (ViT-DD) is developed that makes use of inductive information contained in training signals of distraction detection as well as driver emotion recognition. Further, a self-learning algorithm is designed to include driver data without emotion labels into the multi-task training of ViT-DD. Extensive experiments conducted on the SFDDD and AUCDD datasets demonstrate that the proposed ViT-DD outperforms the best state-of-the-art approaches for driver distraction detection by 6.5% and 0.9%, respectively. Our source code is released at https://github.com/PurdueDigitalTwin/ViT-DD.
翻译:在加强交通安全和减少交通事故方面,驾驶分心检测是一个重要的计算机视觉问题,可以发挥关键作用,加强交通安全和减少交通事故。本文件建议采用新的半监督的半监督方法,根据《视觉变异器》来探测驾驶分心。具体地说,正在开发一种多式视觉变异器(VIT-DD),利用分心检测培训信号中的感应信息以及驱动器情绪识别。此外,自学算法旨在将没有情感标签的驾驶数据纳入VIT-DD的多任务培训中。在SFDD和AUCDD数据集上进行的广泛实验表明,拟议的VIT-DD比对驾驶分心检测的最佳状态方法分别高出6.5%和0.9%。我们的源代码在https://github.com/Purdue DigitalWwin/ViT-DD发布。