In this paper, we propose a data-model-hardware tri-design framework for high-throughput, low-cost, and high-accuracy multi-object tracking (MOT) on High-Definition (HD) video stream. First, to enable ultra-light video intelligence, we propose temporal frame-filtering and spatial saliency-focusing approaches to reduce the complexity of massive video data. Second, we exploit structure-aware weight sparsity to design a hardware-friendly model compression method. Third, assisted with data and model complexity reduction, we propose a sparsity-aware, scalable, and low-power accelerator design, aiming to deliver real-time performance with high energy efficiency. Different from existing works, we make a solid step towards the synergized software/hardware co-optimization for realistic MOT model implementation. Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
翻译:在本文中,我们提出了高通量、低成本和高精度多对象高通量、低成本和高精度多对象跟踪数据模型设计框架。首先,为了能够提供超光速视频智能,我们提出了时间框架过滤和空间突出重点方法,以减少大规模视频数据的复杂性。第二,我们利用结构-有识重量的散射来设计一个硬件友好型模型压缩方法。第三,在减少数据和模型复杂性方面提供协助,我们提出了宽度-敏度、可缩缩放和低功率加速器设计,目的是以高能效提供实时性能。与现有的工程不同,我们为现实的MOT模型实施而朝着同步化软件/硬软件/硬软件共同优化迈出了坚实的一步。与最先进的MOT基线相比,我们的三角设计方法可以实现12.5x低拉特率、20.9x有效框架速率改进、5.83x低功率和9.78x更高的能源效率,但没有很高的精确度。