逐步关注通用事件边界探测多层差别地图 (Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection)

Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries. The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). To tackle the diversity and complicated semantics of event boundaries, we make three notable improvements. First, we construct a feature bank to store multi-level features of space and time, prepared for difference calculation at multiple scales. Second, to alleviate inadequate temporal modeling of previous methods, we present dense difference maps (DDM) to comprehensively characterize the motion pattern. Finally, we exploit progressive attention on multi-level DDM to jointly aggregate appearance and motion clues. As a result, DDM-Net respectively achieves a significant boost of 14% and 8% on Kinetics-GEBD and TAPOS benchmark, and outperforms the top-1 winner solution of LOVEU Challenge@CVPR 2021 without bells and whistles. The state-of-the-art result demonstrates the effectiveness of richer motion representation and more sophisticated aggregation, in handling the diversity of generic event boundary detection. Our codes will be made available soon.

翻译：在视频理解中,发现一般事件边界是一项重要而又具有挑战性的任务,目的是探测人类自然认识事件边界的时刻。这项任务的主要挑战在于观察不同事件边界的各种时间变化。为此,本文件提出了一个有效和端到端可学习的框架(DDM-Net)。为了应对事件边界的多样性和复杂的语义,我们做了三个显著的改进。首先,我们建造了一个功能库,储存多层次的空间和时间特征,为多种尺度的差异计算制成。第二,为了减轻以往方法的不适当时间模型,我们提供了密集差异图(DDM),以全面描述运动模式。最后,我们利用多层次DDM的逐步关注来共同汇总外观和移动线索。结果,DDDM-Net分别在基力-GEBD和TAPOS基准上显著提高了14%和8%,并超越了LEU Challenge@CVPR 2021的顶级优方案,没有钟声波和哨。最新艺术结果将展示更丰富的运动代表和更复杂的边界活动的有效性。我们即将发现通用的代码。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日