Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the lack of inductive biases with small training sets, the deficit in processing long input sequence, and the limitation of the decoder architecture to utilize temporal relations among multiple action segments to refine the initial predictions. To address these concerns, we design an efficient Transformer-based model for action segmentation task, named ASFormer, with three distinctive characteristics: (i) We explicitly bring in the local connectivity inductive priors because of the high locality of features. It constrains the hypothesis space within a reliable scope, and is beneficial for the action segmentation task to learn a proper target function with small training sets. (ii) We apply a pre-defined hierarchical representation pattern that efficiently handles long input sequences. (iii) We carefully design the decoder to refine the initial predictions from the encoder. Extensive experiments on three public datasets demonstrate that effectiveness of our methods. Code is available at \url{https://github.com/ChinaYi/ASFormer}.
翻译:行动分解任务的算法通常使用时间模型来预测每个框架为一分钟的日常活动正在发生什么行动。最近的研究显示变异器在模拟顺序数据各要素之间的关系方面的潜力。然而,在将变异器直接应用到行动分解任务时,存在若干重大关切,例如,对小培训组缺乏感化偏差,处理长输入序列方面的缺陷,以及解码器结构在利用多个行动部分之间的时间关系来完善初步预测方面的局限性。为了解决这些问题,我们设计了一个高效的变异器基于行动分解任务的模型,名为ASFormer,具有三个不同的特点:(一) 我们明确将本地的连接引入感化前期,因为特征位置高。这限制了假设空间在可靠的范围内,有利于行动分解任务学习与小培训组的适当目标功能。 (二) 我们应用一种预先界定的等级代表模式,高效地处理长期输入序列。 (三) 我们仔细设计了以变异器为基础的行动分解模型,以完善我们现有的编码/中国数据系统 3 展示了我们现有的编码/中国数据分析方法。