Temporal sentence grounding in videos~(TSGV), which aims to localize one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community over the past few years. Different from the task of temporal action localization, TSGV is more flexible since it can locate complicated activities via natural languages, without restrictions from predefined action categories. Meanwhile, TSGV is more challenging since it requires both textual and visual understanding for semantic alignment between two modalities~(i.e., text and video). In this survey, we give a comprehensive overview for TSGV, which i) summarizes the taxonomy of existing methods, ii) provides a detailed description of the evaluation protocols~(i.e., datasets and metrics) to be used in TSGV, and iii) in-depth discusses potential problems of current benchmarking designs and research directions for further investigations. To the best of our knowledge, this is the first systematic survey on temporal sentence grounding. More specifically, we first discuss existing TSGV approaches by grouping them into four categories, i.e., two-stage methods, end-to-end methods, reinforcement learning-based methods, and weakly supervised methods. Then we present the benchmark datasets and evaluation metrics to assess current research progress. Finally, we discuss some limitations in TSGV through pointing out potential problems improperly resolved in the current evaluation protocols, which may push forwards more cutting edge research in TSGV. Besides, we also share our insights on several promising directions, including three typical tasks with new and practical settings based on TSGV.
翻译:(TSGV)将一个目标部分从一个未剪辑的视频中定位到某个句号查询,在过去几年里引起了研究界越来越多的关注。与时间行动定位任务不同,TSGV更灵活,因为它可以通过自然语言找到复杂的活动,而不受预先界定的行动类别的限制。与此同时,TSGV更具挑战性,因为它要求两种模式(即文本和视频)之间的语义一致性的文字和视觉理解。在这次调查中,我们首先对现有的TSGV方法进行了全面的概述,其中(一)总结了现有方法的分类,(二)总结了现有方法的分类,(二)详细描述了评价协议-(即,数据集和指标),在TSGV中将使用的时间行动定位,而没有预先界定的行动类别的限制。与此同时,TSGV具有更大的挑战性,因为TSGV需要从文字和视觉上理解两种模式之间的语义一致性。我们首先通过将现有的TSGV方法分为四个类别,即现有方法的深度推进现有方法,(i),在SBSUSG最后阶段评估过程中,我们用较薄弱的方法来学习当前的标准评估。最后的进度方法,我们学习目前的评估方法,我们用目前的评估。我们用较薄弱的方法,在当前的标准评估最后学习了当前的进度评估方法,在研究中,我们学习了现有的方法,我们用较强的进度方法,我们用较强的方法,在前阶段的方法,我们学习了现有的方法,在研究。