Modelling various spatio-temporal dependencies is the key to recognising human actions in skeleton sequences. Most existing methods excessively relied on the design of traversal rules or graph topologies to draw the dependencies of the dynamic joints, which is inadequate to reflect the relationships of the distant yet important joints. Furthermore, due to the locally adopted operations, the important long-range temporal information is therefore not well explored in existing works. To address this issue, in this work we propose LSTA-Net: a novel Long short-term Spatio-Temporal Aggregation Network, which can effectively capture the long/short-range dependencies in a spatio-temporal manner. We devise our model into a pure factorised architecture which can alternately perform spatial feature aggregation and temporal feature aggregation. To improve the feature aggregation effect, a channel-wise attention mechanism is also designed and employed. Extensive experiments were conducted on three public benchmark datasets, and the results suggest that our approach can capture both long-and-short range dependencies in the space and time domain, yielding higher results than other state-of-the-art methods. Code available at https://github.com/tailin1009/LSTA-Net.
翻译:模拟各种时空依赖性是认识人类在骨骼序列中行为的关键。大多数现有方法过分依赖设计跨度规则或图表地形来吸引动态联合的依存性,这不足以反映遥远但重要的联合之间的关系。此外,由于当地采用的行动,因此在现有工作中没有很好地探索重要的长距离时间信息。为了解决这个问题,我们提议LSTA-Net:一个新的长期短期空间-时空聚合网络,它能够以时空方式有效地捕捉长/短期依赖性。我们设计我们的模型,形成一个纯粹的因子化结构,可以交替地进行空间特征汇总和时间特征汇总。为了改进特征汇总效应,还设计并使用了一种频道式关注机制。在三个公共基准数据集上进行了广泛的实验,结果显示,我们的方法可以捕捉空间和时间域的长期和短距离依赖性,产生比其他状态/版域的MABART-Compal-Compatial-Compatial-Compatial-Compatial-commations。