Smart City applications such as intelligent traffic routing or accident prevention rely on computer vision methods for exact vehicle localization and tracking. Due to the scarcity of accurately labeled data, detecting and tracking vehicles in 3D from multiple cameras proves challenging to explore. We present a massive synthetic dataset for multiple vehicle tracking and segmentation in multiple overlapping and non-overlapping camera views. Unlike existing datasets, which only provide tracking ground truth for 2D bounding boxes, our dataset additionally contains perfect labels for 3D bounding boxes in camera- and world coordinates, depth estimation, and instance, semantic and panoptic segmentation. The dataset consists of 17 hours of labeled video material, recorded from 340 cameras in 64 diverse day, rain, dawn, and night scenes, making it the most extensive dataset for multi-target multi-camera tracking so far. We provide baselines for detection, vehicle re-identification, and single- and multi-camera tracking. Code and data are publicly available.
翻译:智能交通路线或事故预防等智能城市应用依靠计算机视觉方法来精确定位和跟踪车辆。由于缺少准确标签的数据,从多个摄像头中探测和跟踪3D型车辆证明是难以探索的。我们用多个重叠和非重叠的摄像视图为多车跟踪和截断提供了大规模合成数据集。与现有的数据集不同,现有数据集只为2D条捆绑箱提供地面真相跟踪,我们的数据集还包含3D条捆绑盒在摄像和世界坐标、深度估计和实例、语义和光谱分割中的完美标签。数据集由17小时的标签视频材料组成,从64天、雨、黎明和夜景的340个摄像头中录制,成为迄今为止最广泛的多目标多镜头跟踪数据集。我们提供了探测、车辆再识别以及单子和多镜头跟踪的基线。代码和数据是公开的。