Temporal action segmentation approaches have been very successful recently. However, annotating videos with frame-wise labels to train such models is very expensive and time consuming. While weakly supervised methods trained using only ordered action lists require much less annotation effort, the performance is still much worse than fully supervised approaches. In this paper, we introduce timestamp supervision for the temporal action segmentation task. Timestamps require a comparable annotation effort to weakly supervised approaches, and yet provide a more supervisory signal. To demonstrate the effectiveness of timestamp supervision, we propose an approach to train a segmentation model using only timestamps annotations. Our approach uses the model output and the annotated timestamps to generate frame-wise labels by detecting the action changes. We further introduce a confidence loss that forces the predicted probabilities to monotonically decrease as the distance to the timestamps increases. This ensures that all and not only the most distinctive frames of an action are learned during training. The evaluation on four datasets shows that models trained with timestamps annotations achieve comparable performance to the fully supervised approaches.
翻译:时间分解方法最近非常成功。 但是,用框架标签来说明视频以培养这种模型非常昂贵且耗时。 虽然只使用有命令的行动列表而训练的受监管薄弱的方法需要的注释性工作要少得多, 但性能仍然比完全监督的方法差得多。 在本文中, 我们引入了时间分解任务的时间戳监督。 时间戳要求做类似的注解, 以便采取监管不力的方法, 并且提供更大的监督信号。 为了证明时间戳监督的有效性, 我们建议了一种方法, 培训一个仅使用时间戳说明的分解模型。 我们的方法使用模型输出和注释性时间戳来通过检测动作变化来生成框架标签。 我们进一步引入信任性损失, 从而迫使预测的概率随着时间戳距离的增加而单项下降。 这确保了在培训中学习到行动的所有而不仅仅是最独特的框架。 对四个数据集的评估显示, 受过时间戳说明的模型实现了与充分监督的方法可比的性。