The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
翻译:先前的细微细微的数据集主要侧重于分类, 并往往在受控的设置中捕捉, 其相机以物体为重点。 我们推出野生的第一个精美车辆探测(FGVD)数据集, 从一辆汽车上安装的移动相机中采集。 它包含5502个现场图像, 上面有210个独特的多类型车辆的细微标签, 由三级组织起来。 虽然以前的分类数据集还包括不同种类的汽车, FGVD数据集引入了新类标签, 用于对两轮汽车、 汽车和卡车进行分类。 FGVD数据集具有挑战性, 因为它的车辆在复杂的交通情况中, 在类型、 规模、 构成、 封闭性、 和 照明性条件方面有不同 。 目前像 yolov5 和更快的RCNNNN这样的物体探测器在我们的数据集上表现很差, 由于缺乏等级模型。 此外, 我们还介绍了FGVD数据集中现有精密探测器和最近的Herchal- Gal- Galma 图像的组合结果。