TSI: 视频行动识别的时光弹性整合 (TSI: Temporal Saliency Integration for Video Action Recognition) - 专知论文

会员服务 ·

0

Integration · Extensibility · MoDELS · CLUES · 卷积 ·

2021 年 12 月 17 日

TSI: Temporal Saliency Integration for Video Action Recognition

翻译：TSI: 视频行动识别的时光弹性整合

Haisheng Su,Kunchang Li,Jinyuan Feng,Dongliang Wang,Weihao Gan,Wei Wu,Yu Qiao

from arxiv, Submitted to CVPR 2022

Efficient spatiotemporal modeling is an important yet challenging problem for video action recognition. Existing state-of-the-art methods exploit neighboring feature differences to obtain motion clues for short-term temporal modeling with a simple convolution. However, only one local convolution is incapable of handling various kinds of actions because of the limited receptive field. Besides, action-irrelated noises brought by camera movement will also harm the quality of extracted motion features. In this paper, we propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-perception Temporal Integration (CTI) module. Specifically, SME aims to highlight the motion-sensitive area through spatial-level local-global motion modeling, where the saliency alignment and pyramidal motion modeling are conducted successively between adjacent frames to capture motion dynamics with fewer noises caused by misaligned background. CTI is designed to perform multi-perception temporal modeling through a group of separate 1D convolutions respectively. Meanwhile, temporal interactions across different perceptions are integrated with the attention mechanism. Through these two modules, long short-term temporal relationships can be encoded efficiently by introducing limited additional parameters. Extensive experiments are conducted on several popular benchmarks (i.e., Something-Something V1 & V2, Kinetics-400, UCF-101, and HMDB-51), which demonstrate the effectiveness of our proposed method.

翻译：在视频动作识别方面,一个重要而又具有挑战性的问题就是高效的时空模型; 现有最先进的方法利用周边特征差异,获得运动线索,以进行短期时间模型的短期模拟; 然而,由于有限的可接受场,只有一个本地变迁无法处理各种行动。此外,摄影机移动带来的与行动有关的噪音也会损害提取运动功能的质量。在本文件中,我们提议了一个时温调调调调集块,主要包含一个高调调调调解调(SME)模块和一个跨视点101时间整合模块。具体地说,中小企业的目标是通过空间层面的本地-全球运动模型突出运动敏感区域,在相邻的框架之间相继进行显著的对齐和金字塔运动模型,以较少的噪音捕捉运动动态。 CTI旨在通过一组单独的1D演变组合进行多重感知度时间模型的模拟。同时,不同认识之间的时间互动与不同概念-101时间互动(CTI)模块,通过两个高效的基调模式,通过两个基调的基调1, 长期的基调关系,通过两个基调1 展示一些新的基调基准。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

【AAAI2022】基于对比时空前置学习的视频自监督表示

【AAAI2022】基于对比时空前置学习的视频自监督表示

专知会员服务

18+阅读 · 2021年12月19日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

8+阅读 · 2020年4月17日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

6+阅读 · 2020年4月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【CVPR 2019 | tutorial】统一人类活动认识：Unifying Human Activity Understanding

【CVPR 2019 | tutorial】统一人类活动认识：Unifying Human Activity Understanding

专知会员服务

4+阅读 · 2019年11月28日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

98+阅读 · 2019年11月23日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

10+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

38+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

泡泡机器人SLAM

3+阅读 · 2019年3月12日

行为识别（action recognition）目前的难点在哪？

行为识别（action recognition）目前的难点在哪？

极市平台

36+阅读 · 2019年2月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【泡泡一分钟】基于时间滑动LSTM网络的基于骨架动作识别（ICCV2017-106）

【泡泡一分钟】基于时间滑动LSTM网络的基于骨架动作识别（ICCV2017-106）

泡泡机器人SLAM

5+阅读 · 2018年9月27日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

Exploiting long-term temporal dynamics for video captioning

Arxiv

1+阅读 · 2022年2月22日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2022】基于对比时空前置学习的视频自监督表示

【AAAI2022】基于对比时空前置学习的视频自监督表示

专知会员服务

18+阅读 · 2021年12月19日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

8+阅读 · 2020年4月17日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

6+阅读 · 2020年4月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【CVPR 2019 | tutorial】统一人类活动认识：Unifying Human Activity Understanding

【CVPR 2019 | tutorial】统一人类活动认识：Unifying Human Activity Understanding

专知会员服务

4+阅读 · 2019年11月28日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

98+阅读 · 2019年11月23日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

10+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

38+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

泡泡机器人SLAM

3+阅读 · 2019年3月12日

行为识别（action recognition）目前的难点在哪？

行为识别（action recognition）目前的难点在哪？

极市平台

36+阅读 · 2019年2月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【泡泡一分钟】基于时间滑动LSTM网络的基于骨架动作识别（ICCV2017-106）

【泡泡一分钟】基于时间滑动LSTM网络的基于骨架动作识别（ICCV2017-106）

泡泡机器人SLAM

5+阅读 · 2018年9月27日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

相关论文

Exploiting long-term temporal dynamics for video captioning

Arxiv

1+阅读 · 2022年2月22日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员