We propose an efficient cross-cameras surveillance system called,STAC, that leverages spatio-temporal associations between multiple cameras to provide real-time analytics and inference under constrained network environments. STAC is built using the proposed omni-scale feature learning people reidentification (reid) algorithm that allows accurate detection, tracking and re-identification of people across cameras using the spatio-temporal characteristics of video frames. We integrate STAC with frame filtering and state-of-the-art compression for streaming technique (that is, ffmpeg libx264 codec) to remove redundant information from cross-camera frames. This helps in optimizing the cost of video transmission as well as compute/processing, while maintaining high accuracy for real-time query inference. The introduction of AICity Challenge 2023 Data [1] by NVIDIA has allowed exploration of systems utilizing multi-camera people tracking algorithms. We evaluate the performance of STAC using this dataset to measure the accuracy metrics and inference rate for reid. Additionally, we quantify the reduction in video streams achieved through frame filtering and compression using FFmpeg compared to the raw camera streams. For completeness, we make available our repository to reproduce the results, available at https://github.com/VolodymyrVakhniuk/CS444_Final_Project.
翻译:暂无翻译