AccMPEG: 优化视频分析分析的视频编码 (AccMPEG: Optimizing Video Encoding for Video Analytics)

With more videos being recorded by edge sensors (cameras) and analyzed by computer-vision deep neural nets (DNNs), a new breed of video streaming systems has emerged, with the goal to compress and stream videos to remote servers in real time while preserving enough information to allow highly accurate inference by the server-side DNNs. An ideal design of the video streaming system should simultaneously meet three key requirements: (1) low latency of encoding and streaming, (2) high accuracy of server-side DNNs, and (3) low compute overheads on the camera. Unfortunately, despite many recent efforts, such video streaming system has hitherto been elusive, especially when serving advanced vision tasks such as object detection or semantic segmentation. This paper presents AccMPEG, a new video encoding and streaming system that meets all the three requirements. The key is to learn how much the encoding quality at each (16x16) macroblock can influence the server-side DNN accuracy, which we call accuracy gradient. Our insight is that these macroblock-level accuracy gradient can be inferred with sufficient precision by feeding the video frames through a cheap model. AccMPEG provides a suite of techniques that, given a new server-side DNN, can quickly create a cheap model to infer the accuracy gradient on any new frame in near realtime. Our extensive evaluation of AccMPEG on two types of edge devices (one Intel Xeon Silver 4100 CPU or NVIDIA Jetson Nano) and three vision tasks (six recent pre-trained DNNs) shows that AccMPEG (with the same camera-side compute resources) can reduce the end-to-end inference delay by 10-43% without hurting accuracy compared to the state-of-the-art baselines

翻译：由边缘传感器( cameras) 记录更多的视频,并由计算机深神经网( DNNS) 分析, 出现了一种新的视频流系统。不幸的是, 尽管最近做出了许多努力, 但这种视频流系统一直难以实现, 特别是在实时向远程服务器压缩和传送视频时, 同时保存足够的信息, 使服务器端 DNN 能够进行非常准确的推断。视频流系统的理想设计应该同时满足三项关键要求:(1) 编码和流流的低通度, (2) 服务器端端端DNN的高度精确度, (3) 相机上的低偏斜度。不幸的是, 尽管最近做出了许多努力, 这种视频流流系统至今一直难以实现, 特别是当这些视频流向远程远程服务器压缩视频任务, 如物体探测或语系断断断断断层断层分割等。本文展示了AccMPGEG, 一个符合所有三项要求的新的视频编码质量( 16x16) 宏观屏障可以影响服务器端DNNN的准确度, 也就是精确度的精确度, 我们的洞测显示这些宏观端端端端点水平的精确度梯度梯度梯度梯度梯度, 能够通过一个在近100级的C- 节点的C- 节点节级的直路基底的CEEGEG- seral 底的直路的级级级级级级度上, 度上 10- sreal- seleventreal- ferval efervereal laveal laveal laveal lave lave lax lax lax lax eg lax lax lax lax e lax efer efer ef ef ecom la lax del e ladal lader lader ef lader lader lad eal ecom ecom lad e e e e e e e lader e ecom lader lader a lader lader e lader e lader lader a e e la