Long-term complex activity recognition and localisation can be crucial for the decision-making process of several autonomous systems, such as smart cars and surgical robots. Nonetheless, most current methods are designed to merely localise short-term action/activities or combinations of atomic actions that only last for a few frames or seconds. In this paper, we address the problem of long-term complex activity detection via a novel deformable, spatiotemporal parts-based model. Our framework consists of three main building blocks: (i) action tube detection, (ii) the modelling of the deformable geometry of parts, and (iii) a sparsity mechanism. Firstly, action tubes are detected in a series of snippets using an action tube detector. Next, a new 3D deformable RoI pooling layer is designed for learning the flexible, deformable geometry of the constellation of parts. Finally, a sparsity strategy differentiates between activated and deactivate features. We also provide temporal complex activity annotation for the recently released ROAD autonomous driving dataset and the SARAS-ESAD surgical action dataset, to validate our method and show the adaptability of our framework to different domains. As they both contain long videos portraying long-term activities they can be used as benchmarks for future work in this area.
翻译:长期复杂的活动认识和本地化对于智能汽车和手术机器人等若干自主系统的决策过程至关重要,不过,目前大多数方法的设计只是将短期行动/活动或原子行动的组合定位为仅持续几个框架或秒的短期行动/活动或原子行动的组合;在本文件中,我们通过一种新型的、可变形的、时空零部件模型来解决长期复杂的活动探测问题。我们的框架由三个主要构件组成:(一) 行动管探测,(二) 部件可变形几何的模型,以及(三) 散射机制。首先,利用一个动作管探测器在一系列断片中探测出行动管。接下来,设计一个新的3D可变形RoI集合层,以学习各部分星座的灵活、可变形的几何方法。最后,一个震荡战略区分了激活和停止功能的特性。我们还为最近公布的ROAD自动驱动数据集和SAAS-ESA外科动作数据集提供了时间性复杂活动说明。它们可以用来验证我们使用的方法和今后基准区域的长期活动。