视频中人类行动探测时间和频率网络 (Time and Frequency Network for Human Action Detection in Videos)

Currently, spatiotemporal features are embraced by most deep learning approaches for human action detection in videos, however, they neglect the important features in frequency domain. In this work, we propose an end-to-end network that considers the time and frequency features simultaneously, named TFNet. TFNet holds two branches, one is time branch formed of three-dimensional convolutional neural network(3D-CNN), which takes the image sequence as input to extract time features; and the other is frequency branch, extracting frequency features through two-dimensional convolutional neural network(2D-CNN) from DCT coefficients. Finally, to obtain the action patterns, these two features are deeply fused under the attention mechanism. Experimental results on the JHMDB51-21 and UCF101-24 datasets demonstrate that our approach achieves remarkable performance for frame-mAP.

翻译：目前,在视频中,人类行动探测的最深层次的学习方法包含了时空特征,但是它们忽略了频率领域的重要特征。在这项工作中,我们提议建立一个端对端网络,同时考虑时间和频率特征,称为TFNet。TFNet拥有两个分支,一个是三维共变神经网络(3D-CNN)的时际分支,将图像序列作为提取时间特征的输入;另一个是频率分支,通过DCT系数的二维共变神经网络(2D-CNN)提取频率特征。最后,为了获得行动模式,这两个特征在关注机制下紧密结合。JHMDB51-21和UCF101-24数据集的实验结果显示,我们的方法在框架-MAP中取得了显著的性能。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日