Human actions in video sequences are characterized by the complex interplay between spatial features and their temporal dynamics. In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition. We propose two tensor-based feature representations, viz. (i) sequence compatibility kernel (SCK) and (ii) dynamics compatibility kernel (DCK); the former building on the spatio-temporal correlations between features, while the latter explicitly modeling the action dynamics of a sequence. We also explore generalization of SCK, coined SCK(+), that operates on subsequences to capture the local-global interplay of correlations, which can incorporate multi-modal inputs e.g., skeleton 3D body-joints and per-frame classifier scores obtained from deep learning models trained on videos. We introduce linearization of these kernels that lead to compact and fast descriptors. We provide experiments on (i) 3D skeleton action sequences, (ii) fine-grained video sequences, and (iii) standard non-fine-grained videos. As our final representations are tensors that capture higher-order relationships of features, they relate to co-occurrences for robust fine-grained recognition. We use higher-order tensors and so-called Eigenvalue Power Normalization (EPN) which have been long speculated to perform spectral detection of higher-order occurrences, thus detecting fine-grained relationships of features rather than merely count features in action sequences. We prove that a tensor of order r, built from Z* dimensional features, coupled with EPN indeed detects if at least one higher-order occurrence is `projected' into one of its binom(Z*,r) subspaces of dim. r represented by the tensor, thus forming a Tensor Power Normalization metric endowed with binom(Z*,r) such `detectors'.
翻译:视频序列中的人类动作的特征是空间特征及其时间动态之间的复杂相互作用。 在本文中, 我们提出新的演示, 用于在动作识别任务视觉特征之间紧紧捕更高顺序的关系。 我们提出两个基于温度的特征演示, 即:(一) 序列兼容性内核( SCK) 和(二) 动态兼容内核( DCK); 前者建在各功能之间的空间时空相关性上, 而后者则明确模拟一个序列的动作动态。 我们还探讨了在次序列上运行的SCK、 硬币 SCK( +) 的概括化, 以子序列为主, 来捕捉取地方- 全球相互关系的相互作用。 例如, 骨架 3D 机身连接和每个框架的分类分数。 我们对这些内层的内核动作进行了线化, 我们用3- 直系动作序列来进行实验, (二) 精细精细的视频序列, 和(三) 标准的非直径直系内核的内核内核关系, 因此, 直径直系内核的内核变变变变变。