Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter feature could represent motion characteristics of adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner. First, to exploit the low-level spatial features, we design a feature fusion architecture search module to automatically search for the best combination of the low-level and high-level spatial features. Next, inspired by the recent transformer, we introduce a long-term temporal modeling module to model the global temporal relations based on the extracted spatial appearance features. Meanwhile, we design another short-term temporal modeling module to encode the motion characteristics between adjacent frame representations. After that, the final predictions can be obtained by feeding the embedded rich spatial-temporal features to a common frame-level class prototype matcher. We extensively validate the proposed SloshNet on four few-shot action recognition datasets, including Something-Something V2, Kinetics, UCF101, and HMDB51. It achieves favorable results against state-of-the-art methods in all datasets.
翻译:空间和时间建模是微小动作识别的最核心方面之一。 大多数先前的工作主要侧重于基于高层次空间代表的长时关系建模,而没有考虑到关键的低层次空间特征和短期时间关系。 事实上,前一个特征可以带来丰富的本地语义信息,而后一个特征可以分别代表相邻框架的动态特征。 在本文中,我们提议SloshNet,这是一个以更精细的方式重新审视微小动作识别的空间和时间建模的新框架。首先,为了利用低层次空间特征,我们设计了一个特征聚合结构搜索模块,自动寻找低层次和高层次空间特征的最佳组合。接下来,我们根据最近的变异器,引入了一个长期时间建模模块,以根据抽取的空间外观特征建模全球时间关系。与此同时,我们设计了另一个短期时间建模模块,用于对相邻框架演示之间的动作特征进行编码。随后,通过将嵌入的丰富空间-时间建模特征建模结构结构结构搜索模块,我们设计了一个自动搜索模块组合模块,将低层次和高层次空间特征进行最佳组合。 我们广泛验证了包括SIMF-SL-SL-B-C-C-SL-C-C-S-S-C-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S