迈向边缘视频分析自动模型专业化 (Towards Automatic Model Specialization for Edge Video Analytics)

Judging by popular and generic computer vision challenges, such as the ImageNet or PASCAL VOC, neural networks have proven to be exceptionally accurate in recognition tasks. However, state-of-the-art accuracy often comes at a high computational price, requiring hardware acceleration to achieve real-time performance, while use cases, such as smart cities, require images from fixed cameras to be analyzed in real-time. Due to the amount of network bandwidth these streams would generate, we cannot rely on offloading compute to a centralized cloud. Thus, a distributed edge cloud is expected to process images locally. However, the edge is, by nature, resource-constrained, which puts a limit on the computational complexity that can execute. Yet, there is a need for a meeting point between the edge and accurate real-time video analytics. Specializing lightweight models on a per-camera basis may help but it quickly becomes unfeasible as the number of cameras grows unless the process is automated. In this paper, we present and evaluate COVA (Contextually Optimized Video Analytics), a framework to assist in the automatic specialization of models for video analytics in edge cameras. COVA automatically improves the accuracy of lightweight models through their specialization. Moreover, we discuss and review each step involved in the process to understand the different trade-offs that each one entails. Additionally, we show how the sole assumption of static cameras allows us to make a series of considerations that greatly simplify the scope of the problem. Finally, experiments show that state-of-the-art models, i.e., able to generalize to unseen environments, can be effectively used as teachers to tailor smaller networks to a specific context, boosting accuracy at a constant computational cost. Results show that our COVA can automatically improve accuracy of pre-trained models by an average of 21%.

翻译：从流行和通用的计算机视觉挑战(如图像网或PASAL VOC)来判断,神经网络在识别任务中被证明非常精确。然而,最先进的准确性往往以高计算价格出现,需要硬件加速才能实现实时性能,而使用诸如智能城市等案例需要实时分析固定相机的图像。由于网络带宽的量,这些流将产生,我们无法依靠将计算卸载到中央云层。因此,预计本地处理图像时会有一个分布式的边缘云。然而,根据自然的准确性,资源受限制,这限制了计算复杂性。然而,最先进的准确性往往会以高的计算价格出现,需要硬件加速实现实时性运行,同时使用硬性摄像头来分析,但光量模型的增速很快变得不可行,除非程序自动化,否则我们可以大量地展示并评价COVA(直观的优化视频分析),而这种边缘的精确性是,一个框架可以帮助自动地进行精确的精确度,一个自动的精确性成本化的计算模型,最终地展示我们所使用的具体成本。