Temporal action segmentation (TAS) from videos aims at densely identifying video frames in minutes-long videos with multiple action classes. As a long-range video understanding task, researchers have developed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors. In this survey, we analyze and summarize the most significant contributions and trends to this endeavor. In particular, we first examine the task definition, common benchmarks, types of supervision, and prevalent evaluation measures. In addition, we systematically investigate two essential techniques of this topic, i.e., frame representation, and temporal modeling, which have been studied extensively in the literature. We then conduct a thorough review of existing TAS works categorized by their levels of supervision and conclude our survey by identifying and emphasizing several research gaps. In addition, we have curated a list of TAS resources, which is available at https://github.com/atlas-eccv22/awesome-temporal-action-segmentation.
翻译:从视频中抽取的时间行动分解(TAS)的目的是在长长的视频中用多个行动课来密集地识别视频框架,作为一项远程视频理解任务,研究人员开发了一套广泛的方法,并使用各种基准检查了他们的业绩。尽管近年来TAS技术的迅速发展,但这些部门没有进行系统的调查。在这次调查中,我们分析和总结了对这一努力的最重要贡献和趋势。特别是,我们首先审查了任务定义、共同基准、监督类型和普遍评估措施。此外,我们系统地调查了这一专题的两种基本技术,即框架代表性和时间模型,这些技术已经在文献中进行了广泛研究。然后,我们彻底审查了现有的TAS工作,按其监督级别分类,并通过查明和强调一些研究差距来结束我们的调查。此外,我们整理了一份TAS资源清单,可在https://github.com/atlas-eccv22/aweome-interoporal-action-sementmentmentation。</s>