Traffic event cognition and reasoning in videos is an important task that has a wide range of applications in intelligent transportation, assisted driving, and autonomous vehicles. In this paper, we create a novel dataset, TrafficQA (Traffic Question Answering), which takes the form of video QA based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios. Specifically, we propose 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events. Moreover, we propose Eclipse, a novel Efficient glimpse network via dynamic inference, in order to achieve computation-efficient and reliable video reasoning. The experiments show that our method achieves superior performance while reducing the computation cost significantly. The project page: https://github.com/SUTDCV/SUTD-TrafficQA.
翻译:视频中的交通事件认知和推理是一项重要任务,在智能交通、辅助驾驶和自主车辆方面有着广泛的应用。在本文中,我们创建了一个新型数据集,即TeleQA(Traffic question 回答),其形式为视频QA,其依据是收集到的10 080个视频和62 535个配对,以衡量复杂交通情况中因果推断和事件理解模型的认知能力。具体地说,我们提出了与各种交通情况相对应的6个具有挑战性的推理任务,以评价不同复杂但实际交通事件的推理能力。此外,我们提出Eclipse,一个通过动态推推推的新的高效视觉网络,以达到计算效率和可靠的视频推理。实验表明,我们的方法在大幅降低计算成本的同时取得了优异的成绩。项目网页:https://github.com/SUTDCV/SUTD-TrafficQA。