VL-解释:用于解释视觉语言变形器的交互式视觉化工具 (VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers) - 专知论文

会员服务 ·

0

多峰值 · INTERACT · Attention · 变换 · Vision ·

2022 年 8 月 22 日

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

翻译：VL-解释:用于解释视觉语言变形器的交互式视觉化工具

Estelle Aflalo,Meng Du,Shao-Yen Tseng,Yongfei Liu,Chenfei Wu,Nan Duan,Vasudev Lal

from arxiv, Best Demo Award at CVPR 2022

Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.

翻译：在以变压器为基础的模型中,突破的突破不仅使NLP领域发生革命,而且使愿景和多式联运系统也发生了革命性变化。然而,虽然为NLP模型提供了可视化和可解释的工具,但内视和多式联运变压器的内部机制仍然基本不透明。随着这些变压器的成功,人们越来越需要理解其内部运作,因为拆解这些黑盒子将导致更有能力和更值得信赖的模式。为了促进这一探索,我们提议VL-InterpreT,它为解释多式变压器中的注意力和隐蔽表现提供了新的互动可视化工具。VL-Interpret是一个任务性综合工具,它(1) 跟踪所有层次的注意对象对视觉和语言组成部分的各种统计数据,(2) 通过容易读取的热测仪将跨模式和内部的注意力化,(3) 绘制通过变压器层层传递的视觉和语言象征的隐藏的表达方式。在本文中,我们通过对KD-VLP的分析展示了VP的功能性解释。VL-解释是一种任务,这是一种在视觉-最后到最后的视野-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-日历-理解-理解-理解-理解-格式-结论,我们的两个。

0

相关内容

多峰值

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

细胞整合素(integrin)激活蛋白kindlin的结构生物学研究

国家自然科学基金

0+阅读 · 2015年12月31日

航空发动机主轴轴承局部故障激励机理与特征提取方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

药物代谢组学指导肾移植患者他克莫司个体化用药及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

全球变暖背景下东海黑潮流场变异和相关机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

近空间高超声速飞行器结构多学科优化理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

TIP30核内化的分子机制及其与EGFR信号通路的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

亚微米尺度下Beta钛合金单晶力学行为及变形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年10月6日

When and why vision-language models behave like bags-of-words, and what to do about it?

When and why vision-language models behave like bags-of-words, and what to do about it?

Arxiv

1+阅读 · 2022年10月6日

Guiding the PLMs with Semantic Anchors as Intermediate Supervision: Towards Interpretable Semantic Parsing

Arxiv

0+阅读 · 2022年10月4日

Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

Arxiv

0+阅读 · 2022年9月30日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《2024年度美国防部作战测试与评估报告》500页

《面相未来作战空中系统中有人-无人编组的AI驱动协作模式选择》含slides

无人机编队飞行：复杂环境中作战的策略、挑战与应用

《探索军事背景下共享大语言模型：AI助手与智能体部署中可扩展性与效率的早期洞察》（含44页slides）

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年10月6日

When and why vision-language models behave like bags-of-words, and what to do about it?

When and why vision-language models behave like bags-of-words, and what to do about it?

Arxiv

1+阅读 · 2022年10月6日

Guiding the PLMs with Semantic Anchors as Intermediate Supervision: Towards Interpretable Semantic Parsing

Arxiv

0+阅读 · 2022年10月4日

Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

Arxiv

0+阅读 · 2022年9月30日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

相关基金

细胞整合素(integrin)激活蛋白kindlin的结构生物学研究

国家自然科学基金

0+阅读 · 2015年12月31日

航空发动机主轴轴承局部故障激励机理与特征提取方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

药物代谢组学指导肾移植患者他克莫司个体化用药及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

全球变暖背景下东海黑潮流场变异和相关机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

近空间高超声速飞行器结构多学科优化理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

TIP30核内化的分子机制及其与EGFR信号通路的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

亚微米尺度下Beta钛合金单晶力学行为及变形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员