Automatic detection of natural disasters and incidents has become more important as a tool for fast response. There have been many studies to detect incidents using still images and text. However, the number of approaches that exploit temporal information is rather limited. One of the main reasons for this is that a diverse video dataset with various incident types does not exist. To address this need, in this paper we present a video dataset, Video Dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43 incident categories. Each incident class has around 100 videos with a duration of ten seconds on average. To increase diversity, the videos have been searched in several languages. To assess the performance of the recent state-of-the-art approaches, Vision Transformer and TimeSformer, as well as to explore the contribution of video-based information for incident classification, we performed benchmark experiments on the VIDI and Incidents Dataset. We have shown that the recent methods improve the incident classification accuracy. We have found that employing video data is very beneficial for the task. By using the video data, the top-1 accuracy is increased to 76.56% from 67.37%, which was obtained using a single frame. VIDI will be made publicly available. Additional materials can be found at the following link: https://github.com/vididataset/VIDI.
翻译:作为快速应对手段,自动检测自然灾害和事件已变得更加重要。已经进行了许多研究,以利用静态图像和文字探测事件。但是,利用时间信息的方法数量非常有限。主要原因之一是没有各种事件类型的多种视频数据集。为解决这一需要,我们在本文件中提供了视频数据集,即事件视频数据集,VIDI,其中包含43个事件类别的4 534个视频剪辑。每个事件类平均有100个视频,平均持续10秒钟。为了增加多样性,已经用几种语言搜索了视频。为了评估最新状态-艺术方法、愿景变异器和时光仪的性能,以及探索视频信息对事件分类的贡献,我们在VIDI和事件数据集上进行了基准实验。我们发现,最近采用的方法提高了事件分类的准确性。我们发现,通过使用视频数据,头1级的准确性从67.37%提高到76.56%。 利用一个单一框架可获取的 VIDI 。