视频模型中的 " 单站式 " 场际关注 (Stand-Alone Inter-Frame Attention in Video Models) - 专知论文

会员服务 ·

0

Attention · Extensibility · MoDELS · 块 · Learning ·

2022 年 6 月 14 日

Stand-Alone Inter-Frame Attention in Video Models

翻译：视频模型中的 " 单站式 " 场际关注

Fuchen Long,Zhaofan Qiu,Yingwei Pan,Ting Yao,Jiebo Luo,Tao Mei

from arxiv, CVPR 2022; Code is publicly available at: https://github.com/FuchenUSTC/SIFA

Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spatial and temporal convolutions separately, or computing self-attention along temporal dimension. The implicit assumption behind such successes is that the feature maps across consecutive frames can be nicely aggregated. Nevertheless, the assumption may not always hold especially for the regions with large deformation. In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location. Technically, SIFA remoulds the deformable design via re-scaling the offset predictions by the difference between two frames. Taking each spatial location in the current frame as the query, the locally deformable neighbors in the next frame are regarded as the keys/values. Then, SIFA measures the similarity between query and keys as stand-alone attention to weighted average the values for temporal aggregation. We further plug SIFA block into ConvNets and Vision Transformer, respectively, to devise SIFA-Net and SIFA-Transformer. Extensive experiments conducted on four video datasets demonstrate the superiority of SIFA-Net and SIFA-Transformer as stronger backbones. More remarkably, SIFA-Transformer achieves an accuracy of 83.1% on Kinetics-400 dataset. Source code is available at \url{https://github.com/FuchenUSTC/SIFA}.

翻译：作为视频的独特性,对于视频理解模型的发展至关重要。现代深层学习模型通过执行时空三维演进,将三维演进分解成空间和时间演进,或者根据时间层面计算自我自留。这些成功背后的隐含假设是,连续框架的地貌图可以很好地加以汇总。然而,这一假设可能并不总是特别适用于出现大变形的区域。在本文中,我们提出了一种新的框架间关注区块的配方,即独立方际注意(SIFA),这种模式以新颖的方式进入框架间变异,以估计每个空间位置的局部和时间级变异,将三维演进到空间和时间层面的局部自我保持,从技术上说,SIFA通过重新缩放,将当前框架中的每个空间位置视为关键/价值。然后,SIFA的查询和键间对内部基流流流流流数据进行进一步测量,将SIFA系统-直流的直径直径直径直流直径直径直径,将SIFA的S-直径直径直径直径,将S-IFA的S-FA的S-IFA数据转换为S-IFA的基FA的系统-直径直径流数据转换为S-IFA的基FA 和S-I-I-I-I-I-I-I-S-I-I-I-I-I-I-I-I-I-I-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-

0

相关内容

Attention

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

新疆天山北坡经济带PM_2.5时空分布与LUCC的关联性研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

从初边值问题到双曲方程的低维表示

国家自然科学基金

0+阅读 · 2012年12月31日

光滑函数类上的几个逼近问题

国家自然科学基金

0+阅读 · 2012年12月31日

微分方程的分支理论

国家自然科学基金

0+阅读 · 2012年12月31日

IKK与JNK信号通路促进炎性衰老的分子机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

不可压Navier-Stokes方程的适定性与正则性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Negative Frames Matter in Egocentric Visual Query 2D Localization

Arxiv

0+阅读 · 2022年8月3日

Per-Clip Video Object Segmentation

Arxiv

0+阅读 · 2022年8月3日

Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks

Arxiv

0+阅读 · 2022年8月1日

ATCA: an Arc Trajectory Based Model with Curvature Attention for Video Frame Interpolation

ATCA: an Arc Trajectory Based Model with Curvature Attention for Video Frame Interpolation

Arxiv

0+阅读 · 2022年8月1日

End-to-end View Synthesis via NeRF Attention

End-to-end View Synthesis via NeRF Attention

Arxiv

0+阅读 · 2022年8月1日

Consistent Quality Oriented Rate Control in HEVC via Balancing Intra and Inter Frame Coding

Arxiv

0+阅读 · 2022年7月30日

Terrain-Adaptive, ALIP-Based Bipedal Locomotion Controller via Model Predictive Control and Virtual Constraints

Arxiv

0+阅读 · 2022年7月28日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

44+阅读 · 2022年4月16日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Negative Frames Matter in Egocentric Visual Query 2D Localization

Arxiv

0+阅读 · 2022年8月3日

Per-Clip Video Object Segmentation

Arxiv

0+阅读 · 2022年8月3日

Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks

Arxiv

0+阅读 · 2022年8月1日

ATCA: an Arc Trajectory Based Model with Curvature Attention for Video Frame Interpolation

ATCA: an Arc Trajectory Based Model with Curvature Attention for Video Frame Interpolation

Arxiv

0+阅读 · 2022年8月1日

End-to-end View Synthesis via NeRF Attention

End-to-end View Synthesis via NeRF Attention

Arxiv

0+阅读 · 2022年8月1日

Consistent Quality Oriented Rate Control in HEVC via Balancing Intra and Inter Frame Coding

Arxiv

0+阅读 · 2022年7月30日

Terrain-Adaptive, ALIP-Based Bipedal Locomotion Controller via Model Predictive Control and Virtual Constraints

Arxiv

0+阅读 · 2022年7月28日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

44+阅读 · 2022年4月16日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

新疆天山北坡经济带PM_2.5时空分布与LUCC的关联性研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

从初边值问题到双曲方程的低维表示

国家自然科学基金

0+阅读 · 2012年12月31日

光滑函数类上的几个逼近问题

国家自然科学基金

0+阅读 · 2012年12月31日

微分方程的分支理论

国家自然科学基金

0+阅读 · 2012年12月31日

IKK与JNK信号通路促进炎性衰老的分子机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

不可压Navier-Stokes方程的适定性与正则性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员