微弱监督时间语言定位的精细语义一致网络 (Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding)

Temporal language grounding (TLG) aims to localize a video segment in an untrimmed video based on a natural language description. To alleviate the expensive cost of manual annotations for temporal boundary labels, we are dedicated to the weakly supervised setting, where only video-level descriptions are provided for training. Most of the existing weakly supervised methods generate a candidate segment set and learn cross-modal alignment through a MIL-based framework. However, the temporal structure of the video as well as the complicated semantics in the sentence are lost during the learning. In this work, we propose a novel candidate-free framework: Fine-grained Semantic Alignment Network (FSAN), for weakly supervised TLG. Instead of view the sentence and candidate moments as a whole, FSAN learns token-by-clip cross-modal semantic alignment by an iterative cross-modal interaction module, generates a fine-grained cross-modal semantic alignment map, and performs grounding directly on top of the map. Extensive experiments are conducted on two widely-used benchmarks: ActivityNet-Captions, and DiDeMo, where our FSAN achieves state-of-the-art performance.

翻译：时间语言定位(TLG)旨在根据自然语言描述将视频段本地化。为了降低时间边界标签人工说明的成本成本,我们致力于那些监管不力、只提供视频级别的描述用于培训的薄弱环境。大多数现有的薄弱监督方法产生一个候选部分,通过基于MIL的框架学习跨模式调整。然而,视频的时间结构以及句子中复杂的语义在学习过程中丢失了。在这项工作中,我们提出了一个新的无候选人框架:精细的语义调整网络(FSAN),用于监管不力的TLG。我们不是将句子和候选时间作为一个整体看,而是通过一个互动的跨模式互动模块学习逐字形跨式语义调整,生成一个精细的跨模式语义校正图,并在地图上直接进行地面定位。在两种广泛使用的基准上进行了广泛的实验:活动网络和DieMo,我们FSAN实现了州的业绩。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日