To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate qualitatively and quantitatively (AP) the performance of the pre-trained models on KITTI-MOTS and MOTSChallenge datasets. We observe a significant improvement in performance after fine-tuning these models on the datasets of interest and optimizing hyperparameters. Finally, we run inference in unusual situations using out of context datasets, and present interesting results that help us understanding better the networks.
翻译:以图像内容为基础对图像对象进行检测和分割,是计算机视觉领域最活跃的课题之一。如今,这个问题可以利用诸如更快 R-CNN 或 YOLO 等深层学习结构加以解决。在本文中,我们研究了在探测器中展示的不同Retinnet、更快 R-CNN 和Mask R-CNN 配置的行为。首先,我们从质量和数量上评价了KITTI-MOTS和MOTS Challenge 数据集的预培训模型的性能。我们观察到,在对兴趣数据集进行微调和优化超光谱仪等这些模型之后,这些模型的性能有了显著改善。最后,我们利用背景数据集在异常情况下进行了推断,并提出了有助于我们更好地了解网络的有趣结果。