Activity detection in surveillance videos is a challenging task caused by small objects, complex activity categories, its untrimmed nature, etc. Existing methods are generally limited in performance due to inaccurate proposals, poor classifiers or inadequate post-processing method. In this work, we propose a comprehensive and effective activity detection system in untrimmed surveillance videos for person-centered and vehicle-centered activities. It consists of four modules, i.e., object localizer, proposal filter, activity classifier and activity refiner. For person-centered activities, a novel part-attention mechanism is proposed to explore detailed features in different body parts. As for vehicle-centered activities, we propose a localization masking method to jointly encode motion and foreground attention features. We conduct experiments on the large-scale activity detection datasets VIRAT, and achieve the best results for both groups of activities. Furthermore, our team won the 1st place in the TRECVID 2021 ActEV challenge.
翻译:监视录像中的活动探测是一项艰巨的任务,是由小型物体、复杂活动类别、其未修补的性质等造成的。由于建议不准确、分类不善或处理后方法不充分,现有方法的执行一般有限。在这项工作中,我们提议在以人为中心的和以车辆为中心的活动中,在未剪接的监视录像中,建立一个全面和有效的活动探测系统,由四个单元组成,即物体定位器、建议过滤器、活动分类器和活动改进器。对于以人为中心的活动,建议建立一个新的部分保护机制,以探索不同身体部分的详细特征。关于以车辆为中心的活动,我们提议一种地方化掩码方法,以联合编码运动和表面注意特征。我们进行了大规模活动探测数据集VIRAT实验,并为这两类活动取得最佳结果。此外,我们的团队赢得了2021年RECVID AcEV挑战的一席。