As the number of installed cameras grows, so do the compute resources required to process and analyze all the images captured by these cameras. Video analytics enables new use cases, such as smart cities or autonomous driving. At the same time, it urges service providers to install additional compute resources to cope with the demand while the strict latency requirements push compute towards the end of the network, forming a geographically distributed and heterogeneous set of compute locations, shared and resource-constrained. Such landscape (shared and distributed locations) forces us to design new techniques that can optimize and distribute work among all available locations and, ideally, make compute requirements grow sublinearly with respect to the number of cameras installed. In this paper, we present FoMO (Focus on Moving Objects). This method effectively optimizes multi-camera deployments by preprocessing images for scenes, filtering the empty regions out, and composing regions of interest from multiple cameras into a single image that serves as input for a pre-trained object detection model. Results show that overall system performance can be increased by 8x while accuracy improves 40% as a by-product of the methodology, all using an off-the-shelf pre-trained model with no additional training or fine-tuning.
翻译:随着安装相机数量的增加,计算处理和分析这些相机所拍摄的所有图像所需要的资源也是这样。视频分析使智能城市或自主驾驶等新的使用案例成为新的使用案例。同时,它敦促服务提供商安装额外的计算资源以满足需求,而严格的延时要求则将计算推向网络的末端,形成地理分布和多样化的一组计算地点,共享和资源限制。这种景观(共享和分布地点)迫使我们设计新技术,以便在所有可用的地点之间优化和分配工作,理想的是,对安装的相机的数量进行子线性要求的计算。我们在本文件中介绍FOMO(移动对象焦点),这种方法有效地优化了多相机的部署,通过对屏幕的预处理图像,将空区域过滤出去,将多个摄像头感兴趣的区域凝聚成一个单一图像,作为预先训练的物体探测模型的输入。结果显示,总体系统性能可以提高8x,同时精确度提高40%,作为改进方法的精细产品。所有方法都使用非精细的模型或改进方法,所有方法都使用非精细的改进前训练。