Video object detection has been an important yet challenging topic in computer vision. Traditional methods mainly focus on designing the image-level or box-level feature propagation strategies to exploit temporal information. This paper argues that with a more effective and efficient feature propagation framework, video object detectors can gain improvement in terms of both accuracy and speed. For this purpose, this paper studies object-level feature propagation, and proposes an object query propagation (QueryProp) framework for high-performance video object detection. The proposed QueryProp contains two propagation strategies: 1) query propagation is performed from sparse key frames to dense non-key frames to reduce the redundant computation on non-key frames; 2) query propagation is performed from previous key frames to the current key frame to improve feature representation by temporal context modeling. To further facilitate query propagation, an adaptive propagation gate is designed to achieve flexible key frame selection. We conduct extensive experiments on the ImageNet VID dataset. QueryProp achieves comparable accuracy with state-of-the-art methods and strikes a decent accuracy/speed trade-off. Code is available at https://github.com/hf1995/QueryProp.
翻译:传统方法主要侧重于设计图像级或框级特征传播战略,以利用时间信息。本文认为,借助一个更有成效和效率更高的特征传播框架,视频对象探测器可以在准确性和速度两方面都得到改进。为此,本文研究了目标级特征传播,并提出了高性能视频目标探测目标查询(QueryProp)框架。拟议的QueryProp包含两个传播战略:(1) 查询从稀疏的关键框架到密集的非关键框架,以减少对非关键框架的重复计算;(2) 从以前的关键框架到当前的关键框架,通过时间背景模型改进特征的体现。为了进一步便利查询,一个适应性传播大门的设计是为了实现灵活的关键框架选择。我们在图像网VID数据集上进行了广泛的实验。QueryProp实现了与最新艺术方法的相似的准确性,并打击了适当的精度/速度交易。代码可在https://github.com/hf1995/QuerProp查阅。