Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is also important to consider the entire sequence of frames to answer the query effectively. In this paper, we present ZEUS, a video analytics system tailored for answering action queries. We present a novel technique for efficiently answering these queries using deep reinforcement learning. ZEUS trains a reinforcement learning agent that learns to adaptively modify the input video segments that are subsequently sent to an action classification network. The agent alters the input segments along three dimensions - sampling rate, segment length, and resolution. To meet the user-specified accuracy target, ZEUS's query optimizer trains the agent based on an accuracy-aware, aggregate reward function. Evaluation on three diverse video datasets shows that ZEUS outperforms state-of-the-art frame- and window-based filtering techniques by up to 22.1x and 4.7x, respectively. It also consistently meets the user-specified accuracy target across all queries.
翻译:视频中的行为的检测和定位是实践中的一个重要问题。 最新的视频分析系统无法高效和有效地回答此类行动询问,因为行动往往涉及物体之间的复杂互动,并且分散在一组框架;检测和定位它们需要计算昂贵的深神经网络;同样重要的是要考虑整个框架序列以有效回答查询。在本文中,我们介绍一个为回答行动询问而专门设计的视频分析系统ZEUS。我们展示了一种利用深强化学习高效回答这些询问的新技术。ZEUS培训了一个强化学习剂,学会适应性地修改输入视频部分,然后将其发送到一个行动分类网络。该剂将输入部分改变为三个维度 -- -- 取样率、段长度和分辨率。为了达到用户指定的准确目标,ZEUS的查询优化器根据准确度和总体奖赏功能对代理器进行了培训。 对三个不同的视频数据集的评价显示,ZEUS/S校程超越了最新的框架和窗口的筛选技术,并且通过22x的连续的测试,还分别满足了所有用户的准确度。