PointTAD: 使用可学习查询点的多标签时间行动探测 (PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points)

Traditional temporal action detection (TAD) usually handles untrimmed videos with small number of action instances from a single label (e.g., ActivityNet, THUMOS). However, this setting might be unrealistic as different classes of actions often co-occur in practice. In this paper, we focus on the task of multi-label temporal action detection that aims to localize all action instances from a multi-label untrimmed video. Multi-label TAD is more challenging as it requires for fine-grained class discrimination within a single video and precise localization of the co-occurring instances. To mitigate this issue, we extend the sparse query-based detection paradigm from the traditional TAD and propose the multi-label TAD framework of PointTAD. Specifically, our PointTAD introduces a small set of learnable query points to represent the important frames of each action instance. This point-based representation provides a flexible mechanism to localize the discriminative frames at boundaries and as well the important frames inside the action. Moreover, we perform the action decoding process with the Multi-level Interactive Module to capture both point-level and instance-level action semantics. Finally, our PointTAD employs an end-to-end trainable framework simply based on RGB input for easy deployment. We evaluate our proposed method on two popular benchmarks and introduce the new metric of detection-mAP for multi-label TAD. Our model outperforms all previous methods by a large margin under the detection-mAP metric, and also achieves promising results under the segmentation-mAP metric. Code is available at https://github.com/MCG-NJU/PointTAD.

翻译：传统时间行动探测(TAD)通常处理未剪切的视频,其操作实例数量少于单一标签(例如活动网、THUMOS),但这一设置可能不切实际,因为不同类别的行动往往在实际中共同出现。在本文件中,我们侧重于多标签时间行动探测任务,目的是从多标签不剪切的视频中将所有行动实例本地化。多标签TAD更具挑战性,因为它要求在单一的视频中细微分级级歧视,并精确定位共同发生的事件。为了缓解这一问题,我们从传统TAD中推广稀释的基于查询的检测模式,并提出多标签TAD的多标签TAD框架。具体地说,我们的点TOTAD引入了一组可学习的询问点,以代表每个行动实例的重要框架。基于点的表述提供了一个灵活的机制,在边界和行动中的重要框架。此外,我们与多级别互动模块模块一起开展行动解析进程,在可点和实例一级行动测试基准下,在TO-BA 新的标准下,在新的标准中,在使用我们之前的路径中,在SBA-BO-ral-ral-lad-lad-lad-real-real-real-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-re-lad-lad-lad-lad-lad-lad-re-lad-lad-lad-lad-re-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-lad-