Temporal action segmentation from videos aims at the dense labeling of video frames with multiple action classes in minutes-long videos. Categorized as a long-range video understanding task, researchers have proposed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid development of action segmentation techniques in recent years, there has been no systematic survey in such fields. To this end, in this survey, we analyse and summarize the main contributions and trends for this task. Specifically, we first examine the task definition, common benchmarks, types of supervision, and popular evaluation measures. Furthermore, we systematically investigate two fundamental aspects of this topic, i.e., frame representation and temporal modeling, which are widely and extensively studied in the literature. We then comprehensively review existing temporal action segmentation works, each categorized by their form of supervision. Finally, we conclude our survey by highlighting and identifying several open topics for research. In addition, we supplement our survey with a curated list of temporal action segmentation resources, which is available at https://github.com/atlas-eccv22/awesome-temporal-action-segmentation.
翻译:从视频中抽取的时间行动分解,目的是在短短的视频录像中用多种行动课对视频框进行密集的标签,作为远程视频理解任务分类,研究人员建议广泛收集方法,并使用各种基准检查其绩效。尽管近年来行动分解技术有了迅速发展,但在这些领域没有进行系统的调查。为此目的,我们在本次调查中分析和总结了这项任务的主要贡献和趋势。具体地说,我们首先审查任务定义、共同基准、监督类型和大众评估措施。此外,我们系统地调查了这一专题的两个基本方面,即框架代表性和时间模型,这些方面在文献中得到了广泛和广泛的研究。然后,我们全面审查了现有的时间分解工作,每个工作都按其监督形式分类。最后,我们通过突出和确定一些开放的研究主题来结束我们的调查。此外,我们用一份时间行动分解资源汇编清单来补充我们的调查,该清单见https://github.com/atlas-eccv22/awecome-teposal-actal-sementationmentation。</s>