In this technical report, we would like to introduce our updates to YOWO, a real-time method for spatio-temporal action detection. We make a bunch of little design changes to make it better. For network structure, we use the same ones of official implemented YOWO, including 3D-ResNext-101 and YOLOv2, but we use a better pretrained weight of our reimplemented YOLOv2, which is better than the official YOLOv2. We also optimize the label assignment used in YOWO. To accurately detection action instances, we deploy GIoU loss for box regression. After our incremental improvement, YOWO achieves 84.9\% frame mAP and 50.5\% video mAP on the UCF101-24, significantly higher than the official YOWO. On the AVA, our optimized YOWO achieves 20.6\% frame mAP with 16 frames, also exceeding the official YOWO. With 32 frames, our YOWO achieves 21.6 frame mAP with 25 FPS on an RTX 3090 GPU. We name the optimized YOWO as YOWO-Plus. Moreover, we replace the 3D-ResNext-101 with the efficient 3D-ShuffleNet-v2 to design a lightweight action detector, YOWO-Nano. YOWO-Nano achieves 81.0 \% frame mAP and 49.7\% video frame mAP with over 90 FPS on the UCF101-24. It also achieves 18.4 \% frame mAP with about 90 FPS on the AVA. As far as we know, YOWO-Nano is the fastest state-of-the-art action detector. Our code is available on https://github.com/yjh0410/PyTorch_YOWO.
翻译:在这份技术报告中,我们想向YOWO介绍我们的最新情况,这是实时检测时空动作的实时方法。我们做了一连串设计上的微小改动来改进它。对于网络结构,我们使用官方执行的YOWO的相同重量,包括3D-Resnext-101和YOLOOv2,但是我们使用我们重新实施的YOLOv2的预加工重量比官方的YOOLOO2要好。我们还优化了YOWO所使用的标签任务。为了精确检测行动,我们用GIOU损失来进行箱回归。在我们逐步改进之后,YOWO在UCFC101-24上实现了84.9 ⁇ 框架的 mAP和50.5英寸视频 mAP,大大高于官方的YOWO。在AVAO中,我们最优化的视频20.6×MO mO, 也超过了官方的YOOOOO。在32个框架上,我们的YOOOOOO OFO实现了21.6框架, 也达到了25个FPS,我们在90-10D框架上实现了我们的FS-10D框架。我们用了30D框架。我们用最优化的FO