In this paper we present a large-scale visual object detection and tracking benchmark, named VisDrone2018, aiming at advancing visual understanding tasks on the drone platform. The images and video sequences in the benchmark were captured over various urban/suburban areas of 14 different cities across China from north to south. Specifically, VisDrone2018 consists of 263 video clips and 10,209 images (no overlap with video clips) with rich annotations, including object bounding boxes, object categories, occlusion, truncation ratios, etc. With intensive amount of effort, our benchmark has more than 2.5 million annotated instances in 179,264 images/video frames. Being the largest such dataset ever published, the benchmark enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. In particular, we design four popular tasks with the benchmark, including object detection in images, object detection in videos, single object tracking, and multi-object tracking. All these tasks are extremely challenging in the proposed dataset due to factors such as occlusion, large scale and pose variation, and fast motion. We hope the benchmark largely boost the research and development in visual analysis on drone platforms.
翻译:在本文中,我们展示了名为VisDrone2018的大型视觉物体探测和跟踪基准,目的是推进无人机平台的视觉理解任务。基准中的图像和视频序列在中国南北14个不同城市的多个城市/郊区采集。具体地说,VisDrone2018由263个视频剪辑和10 209个图像(与视频剪辑无重叠)组成,配有丰富的说明,包括物体捆绑盒、物体类别、隔离、脱轨率等。在大量的努力下,我们的基准在179 264个图像/视频框中有超过250万个附加说明的例子。作为最大的此类数据集,该基准使得能够对无人机平台的视觉分析算法进行广泛的评估和调查。特别是,我们设计了四种通用的基准任务,包括图像中的物体探测、视频中的物体探测、单一物体跟踪和多点跟踪。所有这些任务在拟议的数据集中都具有极大的挑战性,原因是隔离、大尺度和变异形以及快速移动等因素。我们希望这些基准在很大程度上促进了对无人机的视觉分析平台的研究和发展。