" 变异之眼:以地球为中心加热估计的全球-地方关联 " (In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation) - 专知论文

会员服务 ·

0

相关系数 · 估计/估计量 · 词元分析器 · MoDELS · INFORMS ·

2022 年 8 月 10 日

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation

翻译：" 变异之眼:以地球为中心加热估计的全球-地方关联 "

Bolin Lai,Miao Liu,Fiona Ryan,James M. Rehg

from arxiv, 23 pages

In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze estimation. We observe that the connection between the global scene context and local visual information is vital for localizing the gaze fixation from egocentric video frames. To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel Global-Local Correlation (GLC) module to explicitly model the correlation of the global token and each local token. We validate our model on two egocentric video datasets - EGTEA Gaze+ and Ego4D. Our detailed ablation studies demonstrate the benefits of our method. In addition, our approach exceeds previous state-of-the-arts by a large margin. We also provide additional visualizations to support our claim that global-local correlation serves a key representation for predicting gaze fixation from egocentric videos. More details can be found in our website (https://bolinlai.github.io/GLC-EgoGazeEst).

翻译：在本文中,我们展示了第一个基于变压器的模型,以解决以自我为中心的视觉估计这一具有挑战性的问题。我们观察到,全球场景背景与当地视觉信息之间的联系对于从以自我为中心的视频框中确定视像固定位置至关重要。为此,我们设计了变压器编码器,将全球背景嵌入为另一个视觉象征,并进一步提出一个新的全球-地方关系模块,以明确模拟全球象征和每个地方象征的关联性。我们验证了我们两个以自我为中心的视频数据集的模型-EGTEA Gaze+和Ego4D。我们的详细对比研究展示了我们的方法的好处。此外,我们的方法大大超出了以往的艺术状态。我们还提供了更多的可视化支持我们的说法,即全球-地方关系为预测以自我中心视频进行视像固定提供了关键代表。更多的细节可以在我们的网站(https://bolinlai.github.io/GLC-EgoGazeEst)中找到。

0

相关内容

相关系数

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

LuxR家族蛋白调控茂原链霉菌TGase合成的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

视觉注意模型及其在显著目标检测中的应用研究

国家自然科学基金

3+阅读 · 2013年12月31日

层状Aurivillius相Bi2An-1BnO3n+3(n=5)室温单相多铁性材料的制备及多场耦合效应的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

藤黄酸抗B细胞非霍奇金淋巴瘤新机制- - 调控SRC-3/组蛋白乙酰化转录复合物SUMO化修饰

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ti2AlC基材料合成热力学及高温稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

共轭半导体高分子构象统计与光电器件性能关系的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

黄色物质紫外共振和表面增强拉曼光谱研究

国家自然科学基金

0+阅读 · 2009年12月31日

二元混合溶剂中聚异丙基丙烯酰胺与溶剂分子的相互作用

国家自然科学基金

0+阅读 · 2009年12月31日

MGTR: End-to-End Mutual Gaze Detection with Transformer

Arxiv

0+阅读 · 2022年10月6日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects

Arxiv

0+阅读 · 2022年10月5日

Extreme expectile estimation for short-tailed data

Arxiv

0+阅读 · 2022年10月5日

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

Arxiv

0+阅读 · 2022年10月5日

COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

Arxiv

0+阅读 · 2022年10月4日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

From Superpixel to Human Shape Modelling for Carried Object Detection

Arxiv

10+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

估计/估计量

词元分析器

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

MGTR: End-to-End Mutual Gaze Detection with Transformer

Arxiv

0+阅读 · 2022年10月6日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects

Arxiv

0+阅读 · 2022年10月5日

Extreme expectile estimation for short-tailed data

Arxiv

0+阅读 · 2022年10月5日

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

Arxiv

0+阅读 · 2022年10月5日

COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

Arxiv

0+阅读 · 2022年10月4日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

From Superpixel to Human Shape Modelling for Carried Object Detection

Arxiv

10+阅读 · 2018年1月10日

相关基金

LuxR家族蛋白调控茂原链霉菌TGase合成的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

视觉注意模型及其在显著目标检测中的应用研究

国家自然科学基金

3+阅读 · 2013年12月31日

层状Aurivillius相Bi2An-1BnO3n+3(n=5)室温单相多铁性材料的制备及多场耦合效应的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

藤黄酸抗B细胞非霍奇金淋巴瘤新机制- - 调控SRC-3/组蛋白乙酰化转录复合物SUMO化修饰

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ti2AlC基材料合成热力学及高温稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

共轭半导体高分子构象统计与光电器件性能关系的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

黄色物质紫外共振和表面增强拉曼光谱研究

国家自然科学基金

0+阅读 · 2009年12月31日

二元混合溶剂中聚异丙基丙烯酰胺与溶剂分子的相互作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员