As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research field. This study explores how the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular application in autonomous racing. To achieve this, we investigate how replacing certain structural elements of the model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name `YOLO-Z', and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at the cost of just a 3ms increase in inference time compared to the original YOLOv5. Our objective is to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks and provide insights on how specific changes can impact small object detection. Such findings, applied to the broader context of autonomous vehicles, could increase the amount of contextual information available to such systems.
翻译:随着自主车辆和自主赛车越来越受欢迎,同样需要更快和更精确的探测器。虽然我们的裸视眼睛能够几乎即时地(甚至从远处)提取背景信息,但图像分辨率和计算资源的限制使得探测较小的物体(即在输入图像中占据小像素面积的物体)成为机器和开阔研究场真正具有挑战性的任务。本研究探讨了如何对受欢迎的YOLOv5天体探测器进行修改,以提高其在探测较小物体方面的性能,特别是在自动赛事中的性能。为了实现这一目标,我们调查如何取代模型的某些结构要素(以及它们的连接和其他参数),从而影响性能和推算时间。我们这样做时,我们提出了一系列不同尺度的模型,我们称之为“YOLO-Z”,这些模型显示在探测50% IOOU 的小型物体时,在MAP中提高了6.9%的改进程度,其代价是比原始的YOLOv5天体速度增加3米。我们的目标是向未来研究调整大众探测器(以及它们的连接和其他参数)的潜力,例如YOLOO5号所具备的直观感测到的具体程度,如何使特定的车辆的系统能增加其影响。