Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}.
翻译:视频中的时间行动本地化在计算机视野领域提出了重大挑战。虽然对边界敏感的方法已被广泛采用,但其局限性包括中间和全球信息使用不完全,以及低效率建议生成器。为了应对这些挑战,我们提议了一个新颖的框架,即Sprassy多层次边界生成器(SMBG),通过边界分类和行动完整性回归,加强对边界敏感的方法。SMBG是一个多层次的边界模块,通过收集不同长度的边界信息来加快处理速度。此外,我们引入了一个稀疏的提取信任头,区分行动内外的信息,进一步优化建议生成器。为了改善多个分支之间的协同作用,平衡正数和负数样本,我们提议了全球指导损失。我们的方法以两种受欢迎的基准,即活动Net-1.3和THUMOS14来评估,并显示可以达到最先进的业绩,更精确的速度(2.47xBSN++,2.12xDBG)。这些结果显示,SMBG为生成时间行动提案提供了更有效率和简单的解决办法。我们提议的框架有可能推进计算机视野领域以及正反方向/YBA的精确性和速度分析。</s>