总结 " 过去预测未来:自然语言对背景促进多模式物体相互作用的描述 " 。 (Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction) - 专知论文

会员服务 ·

0

多峰值 · INTERACT · MoDELS · Boosting（一种模型训练加速方式） · state-of-the-art ·

2023 年 1 月 22 日

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction

翻译：总结 " 过去预测未来:自然语言对背景促进多模式物体相互作用的描述 " 。

Razvan-George Pasca,Alexey Gavryushin,Yen-Ling Kuo,Otmar Hilliges,Xi Wang

We study the task of object interaction anticipation in egocentric videos. Successful prediction of future actions and objects requires an understanding of the spatio-temporal context formed by past actions and object relationships. We propose TransFusion, a multimodal transformer-based architecture, that effectively makes use of the representational power of language by summarizing past actions concisely. TransFusion leverages pre-trained image captioning models and summarizes the caption, focusing on past actions and objects. This action context together with a single input frame is processed by a multimodal fusion module to forecast the next object interactions. Our model enables more efficient end-to-end learning by replacing dense video features with language representations, allowing us to benefit from knowledge encoded in large pre-trained models. Experiments on Ego4D and EPIC-KITCHENS-100 show the effectiveness of our multimodal fusion model and the benefits of using language-based context summaries. Our method outperforms state-of-the-art approaches by 40.4% in overall mAP on the Ego4D test set. We show the generality of TransFusion via experiments on EPIC-KITCHENS-100. Video and code are available at: https://eth-ait.github.io/transfusion-proj/.

翻译：我们用以自我为中心的视频来研究对象互动预期的任务。成功预测未来行动和对象需要理解过去行动和对象关系形成的时空环境。我们提议Tranfusion(基于多式联运变压器的架构),通过简明扼要地总结过去的行动,有效地利用语言代表力。 Transfusion(TransFusion)利用预先训练的图像字幕模型,并摘要说明标题,重点是过去的行动和对象。这个行动背景和单一输入框架由一个多式联运集成模块处理,以预测下一个对象的相互作用。我们的模型通过用语言表示方式取代密集的视频特征,从而能够更有效地进行端到端学习,从而使我们能够受益于在大型预先培训模式中编码的知识。 Ego4D 和 EPIC-KITCHENS-100 实验显示了我们的多式联运组合模型的有效性和使用基于语言的背景摘要的好处。我们的方法在Ego4D测试集成的 mAP 中将状态- 艺术方法比40.4% 的全方位 mAP, 我们通过 EGEPIC-KITISM/100 Videal 的实验展示了Tradings: MAPIC- MATINSML/100

0

相关内容

多峰值

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

碳纤维三维编织复合材料损伤变形场及渐进破坏实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

深海环境下Fe基非晶涂层的腐蚀行为及其对涂层结构和性能的影响

国家自然科学基金

0+阅读 · 2014年12月31日

微纳米尺度无铅焊点及金属聚合物层间破坏机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

4f和3d电子调控下的新型In和Te基稀土1：3型半导体化合物的磁输运和结构

国家自然科学基金

0+阅读 · 2012年12月31日

磁性薄膜材料平面霍尔效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

岩体断裂面细观接触演化与长期力学行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

颗粒增强涂层材料损伤与破坏的多尺度模拟与实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属旋压成形中的损伤演化和破裂机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

考虑界面破坏的压电智能复合材料结构的失效机理

国家自然科学基金

0+阅读 · 2008年12月31日

The Learnability of In-Context Learning

Arxiv

0+阅读 · 2023年3月14日

Single-branch Network for Multimodal Training

Arxiv

0+阅读 · 2023年3月10日

Examining the interactions between working from home, travel behavior and change in car ownership due to the impact of COVID-19

Arxiv

0+阅读 · 2023年3月10日

Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors

Arxiv

0+阅读 · 2023年3月10日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Arxiv

14+阅读 · 2022年1月20日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

The Learnability of In-Context Learning

Arxiv

0+阅读 · 2023年3月14日

Single-branch Network for Multimodal Training

Arxiv

0+阅读 · 2023年3月10日

Examining the interactions between working from home, travel behavior and change in car ownership due to the impact of COVID-19

Arxiv

0+阅读 · 2023年3月10日

Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors

Arxiv

0+阅读 · 2023年3月10日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Arxiv

14+阅读 · 2022年1月20日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

相关基金

碳纤维三维编织复合材料损伤变形场及渐进破坏实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

深海环境下Fe基非晶涂层的腐蚀行为及其对涂层结构和性能的影响

国家自然科学基金

0+阅读 · 2014年12月31日

微纳米尺度无铅焊点及金属聚合物层间破坏机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

4f和3d电子调控下的新型In和Te基稀土1：3型半导体化合物的磁输运和结构

国家自然科学基金

0+阅读 · 2012年12月31日

磁性薄膜材料平面霍尔效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

岩体断裂面细观接触演化与长期力学行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

颗粒增强涂层材料损伤与破坏的多尺度模拟与实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属旋压成形中的损伤演化和破裂机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

考虑界面破坏的压电智能复合材料结构的失效机理

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员