我的多式联运模式会学习跨模式互动吗?这比你想的要难说! (Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!) - 专知论文

会员服务 ·

0

INTERACT · Performer · 多峰值 · 单峰值 · MoDELS ·

2020 年 10 月 13 日

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

翻译：我的多式联运模式会学习跨模式互动吗?这比你想的要难说!

Jack Hessel,Lillian Lee

Modeling expressive cross-modal interactions seems crucial in multimodal tasks, such as visual question answering. However, sometimes high-performing black-box algorithms turn out to be mostly exploiting unimodal signals in the data. We propose a new diagnostic tool, empirical multimodally-additive function projection (EMAP), for isolating whether or not cross-modal interactions improve performance for a given model on a given task. This function projection modifies model predictions so that cross-modal interactions are eliminated, isolating the additive, unimodal structure. For seven image+text classification tasks (on each of which we set new state-of-the-art benchmarks), we find that, in many cases, removing cross-modal interactions results in little to no performance degradation. Surprisingly, this holds even when expressive models, with capacity to consider interactions, otherwise outperform less expressive models; thus, performance improvements, even when present, often cannot be attributed to consideration of cross-modal feature interactions. We hence recommend that researchers in multimodal machine learning report the performance not only of unimodal baselines, but also the EMAP of their best-performing model.

翻译：建模式的跨模式互动在多式联运任务中似乎至关重要,例如视觉问答。然而,有时高性能黑盒算法大多是利用数据中的单式信号。我们提出了一个新的诊断工具,即实验式多式功能预测(EMAP ), 以孤立跨式互动是否改善了特定任务模式的性能。这个功能预测改变模型预测,以便消除跨式互动,分离添加、单式结构。对于7个图像+文本分类任务(其中每一个我们设定了新的最新基准),我们发现,在许多情况下,消除跨式互动几乎不会导致性能退化。令人惊讶的是,即使具有考虑互动能力的表达式模型,否则会超越表达式;因此,绩效改进,即便在目前,也往往不能归因于跨式特征互动。我们因此建议,多式联运机的研究人员不仅报告非形式基线的性能,而且报告其最佳模型的EMAP 。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

沈向洋博士：科研七个坑，我的“求之不得”职业生涯之感悟

专知会员服务

36+阅读 · 2020年11月17日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

How Can We Know When Language Models Know?

Arxiv

0+阅读 · 2020年12月2日

Probabilistic Timed Automata with One Clock and Initialised Clock-Dependent Probabilities

Arxiv

0+阅读 · 2020年12月1日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Attention Is (not) All You Need for Commonsense Reasoning

Arxiv

7+阅读 · 2019年5月31日

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Arxiv

7+阅读 · 2019年3月12日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

Jointly Learning to Label Sentences and Tokens

Arxiv

3+阅读 · 2018年11月14日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

沈向洋博士：科研七个坑，我的“求之不得”职业生涯之感悟

专知会员服务

36+阅读 · 2020年11月17日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

How Can We Know When Language Models Know?

Arxiv

0+阅读 · 2020年12月2日

Probabilistic Timed Automata with One Clock and Initialised Clock-Dependent Probabilities

Arxiv

0+阅读 · 2020年12月1日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Attention Is (not) All You Need for Commonsense Reasoning

Arxiv

7+阅读 · 2019年5月31日

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Arxiv

7+阅读 · 2019年3月12日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

Jointly Learning to Label Sentences and Tokens

Arxiv

3+阅读 · 2018年11月14日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员