Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor-expansion based approximation, dubbed as Gaussian Approximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2% -0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower temporal resolutions for more efficient inference, facilitating low-resource applications. The code will be available in https://github.com/sauradip/GAP
翻译:现有的温度行动探测方法通常采取预处理步骤,在时间边界估计和行动分类之前,将输入的不长视频转换成固定长度的片段显示序列,在时间边界估计和行动分类之前,先处理步骤通常要先采取预处理步骤,将输入的不长视频转换成固定长度的片段代表序列。这一预处理步骤将暂时减少视频,降低推断分辨率,并妨碍原始时间分辨率的探测性能。从本质上说,这是因为在分辨率下取样和回收过程中引入了时间量化错误。这可能对TAD绩效产生负面影响,但基本上被现有方法所忽视。为了解决这一问题,我们在工作中,我们采用了一种新型模型-无序处理后处理方法,而没有模型的重新设计和再培训。具体地说,我们用高斯分布式分布来模拟开始和结束动作的开始点,以便在亚马利亚网络(+0.2%GAAP)中,通过具有一定的更高程度的成绩,在高斯APRA/AD模型中可以不断改进各种事先经过训练的离线/离线后的后处理方法。</s>