Data scarcity has become one of the main obstacles to developing supervised models based on Artificial Intelligence in Computer Vision. Indeed, Deep Learning-based models systematically struggle when applied in new scenarios never seen during training and may not be adequately tested in non-ordinary yet crucial real-world situations. This paper presents and publicly releases CrowdSim2, a new synthetic collection of images suitable for people and vehicle detection gathered from a simulator based on the Unity graphical engine. It consists of thousands of images gathered from various synthetic scenarios resembling the real world, where we varied some factors of interest, such as the weather conditions and the number of objects in the scenes. The labels are automatically collected and consist of bounding boxes that precisely localize objects belonging to the two object classes, leaving out humans from the annotation pipeline. We exploited this new benchmark as a testing ground for some state-of-the-art detectors, showing that our simulated scenarios can be a valuable tool for measuring their performances in a controlled environment.
翻译:数据匮乏已经成为计算机视觉中基于人工智能的监督模型发展的主要障碍之一。事实上,在新的从未在训练期间见过的场景中应用深度学习模型时,这些模型经常遇到困难,并且可能无法在非普通但关键的真实世界情况下得到充分测试。本文介绍并公开发布了 CrowdSim2 ,这是一个新的适用于人员和车辆检测的合成图像集合,由基于 Unity 图形引擎的模拟器收集而来。它由数千个从各种类似真实世界的合成场景中收集而来的图像组成,我们在其中变化了一些感兴趣的因素,如天气条件和场景中的物体数量。标签是自动收集的,由精确定位属于两个物体类别的对象的边界框组成,其中将人类从注释管道中排除。我们利用这个新的基准测试作为一种测试一些最先进的检测器的性能的有价值的工具,证明了我们的模拟场景可以成为在受控环境中测量他们性能的一种有价值的工具。