We present a semi-supervised learning approach to the temporal action segmentation task. The goal of the task is to temporally detect and segment actions in long, untrimmed procedural videos, where only a small set of videos are densely labelled, and a large collection of videos are unlabelled. To this end, we propose two novel loss functions for the unlabelled data: an action affinity loss and an action continuity loss. The action affinity loss guides the unlabelled samples learning by imposing the action priors induced from the labelled set. Action continuity loss enforces the temporal continuity of actions, which also provides frame-wise classification supervision. In addition, we propose an Adaptive Boundary Smoothing (ABS) approach to build coarser action boundaries for more robust and reliable learning. The proposed loss functions and ABS were evaluated on three benchmarks. Results show that they significantly improved action segmentation performance with a low amount (5% and 10%) of labelled data and achieved comparable results to full supervision with 50% labelled data. Furthermore, ABS succeeded in boosting performance when integrated into fully-supervised learning.
翻译:我们为时间行动分割任务提出了一个半监督的学习方法。 任务的目标是在长长、 未剪切的程序视频中时间检测和分解动作, 只有少量的视频贴上密集的标签, 而大量视频的收集没有贴上标签。 为此, 我们提议为未贴标签的数据设定两个新的损失功能: 行动亲和损失和连续行动损失。 行动亲和损失通过强制实施标签数据集的先前动作来指导未贴标签样本的学习; 行动连续性损失强制实施行动的时间连续性, 同时也提供框架性分类监督。 此外, 我们提议采用适应性边界平滑(ABS) 方法, 以构建更稳健、更可靠的学习的粗略行动界限。 拟议的损失函数和ABS 依据三个基准进行了评估。 结果表明,它们大大改进了标定数据的行动分割性( 5% 和 10% ), 并实现了与50% 标签数据的全面监督的可比结果 。 此外, ABS 在融入完全超常的学习中, 成功地提高了业绩 。