In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task. Such a challenging task resides in an open-set setting without prior action detection or segmentation that requires event-level or even frame-level annotations. To that end, we carefully reorganize two publicly available action-related datasets with step-procedure-task structure. To fully investigate the effectiveness of any method, we collect a scripted video dataset enumerating all kinds of step-level transformations in chemical experiments. Besides, a novel evaluation metric Weighted Distance Ratio is introduced to ensure equivalence for different step-level transformations during evaluation. In the end, a simple but effective baseline based on the transformer encoder with a novel sequence alignment loss is introduced to better characterize long-term dependency between steps, which outperforms other action recognition methods. Codes and data will be released.
翻译:在本文中,我们提出一个新的序列核查任务,目的是将执行相同动作序列的正对视频对配对与进行步阶变换的负对,但仍执行相同任务。这种具有挑战性的任务在于一个开放设置的设置,没有事先行动检测或分解,而没有事先行动检测或分解,这需要事件级别甚至框架层次的注释。为此,我们仔细重组两个公开提供的与行动有关的数据集,并配有步阶程序-任务结构。为了充分调查任何方法的有效性,我们收集了一个脚本视频数据集,其中罗列化学实验中各种步阶变换。此外,还引入了一个新的评价指标“加权距离比率”,以确保评价期间不同步阶变异的等值。最终,引入了一个基于变压器编码的简单而有效的基线,以新的序列调整损失为基础,以更好地描述不同步骤之间的长期依赖性,这些步骤比其他行动识别方法要快。代码和数据将被发布。