The ability to understand the surrounding scene is of paramount importance for Autonomous Vehicles (AVs). This paper presents a system capable to work in a real time guaranteed response times and online fashion, giving an immediate response to the arise of anomalies surrounding the AV, exploiting only the videos captured by a dash-mounted camera. Our architecture, called MOVAD, relies on two main modules: a short-term memory to extract information related to the ongoing action, implemented by a Video Swin Transformer adapted to work in an online scenario, and a long-term memory module that considers also remote past information thanks to the use of a Long-Short Term Memory (LSTM) network. We evaluated the performance of our method on Detection of Traffic Anomaly (DoTA) dataset, a challenging collection of dash-mounted camera videos of accidents. After an extensive ablation study, MOVAD is able to reach an AUC score of 82.11%, surpassing the current state-of-the-art by +2.81 AUC. Our code will be available on https://github.com/IMPLabUniPr/movad/tree/icip
翻译:了解周围景象的能力对于自治车辆(AVs)至关重要。本文展示了一个能够实时保证反应时间和在线方式运行的系统,对AV周围异常现象的出现作出即时反应,只利用破折叠相机拍摄的录像。我们的建筑称为MOVAD,依靠两个主要模块:一个短期记忆,以提取与当前行动有关的信息,由视频Swin变异器实施,适应在线情景,以及一个长期记忆模块,该模块也考虑由于使用长期短期记忆(LSTM)网络而产生的远程过去信息。我们评估了我们探测交通异常(DoTA)数据集的方法的性能,这是一个具有挑战性的破折叠相机事故录像集。经过广泛的通缩研究后,MOVAD能够达到82.11%的ACUC分数,超过目前状态的2.81 AUC。我们的代码将在 https://github.com/IMPLUPr/movad/treeic上查阅。