Llama: 自动调试视频分析管道的异质和无服务器框架 (Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines)

The proliferation of camera-enabled devices and large video repositories has given rise to a diverse set of video analytics applications. The video pipelines for these applications are DAGs of operations that transform videos, process extracted metadata, and answer questions such as, "Is this intersection congested?" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation such as the sampling rate, batch size, or type of hardware used. However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on the desired latency target and the input video contents that may exercise different paths in the DAG and produce different volumes of intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources. Hence, we observe that they often execute inefficiently and fail to meet latency and cost targets. We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Llama optimizes the overall video pipeline latency by (a) dynamically calculating latency targets per-operation invocation, and (b) dynamically running a cost-based optimizer to determine efficient configurations that meet the target latency for each invocation. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.9x lower latency and 17.2x cost reduction on average.

翻译：摄像辅助装置和大型视频储存库的扩散导致了一系列不同的视频解析应用。这些应用的视频管道是DAG, 用于改造视频、处理提取元数据并回答诸如“交叉凝结吗?”之类的问题。管道的延缓度和资源效率可以使用取样率、批量大小或所用硬件类型等可配置的 knob 来优化。然而,确定高效配置具有挑战性,因为(a) 配置搜索空间巨大,以及(b) 优化配置取决于理想的静态目标以及输入流视频内容,这些操作可在DAG中运行不同路径并产生不同数量的中间结果。现有的视频解析和处理系统可以让用户手动配置操作和选择硬件资源。因此,我们观察到,它们执行效率低下,无法达到调控和成本目标。我们提出:一个基于软化和服务器的低调视频管道框架。Llama:一个基于(a) 优化整个视频管道的延缓度目标,在DLlama 运行过程中运行不同路径, 动态计算成本, 显示一个动态递校程中, 显示一个动态递减成本。