CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

6 月 18 日至 22 日，计算机视觉领域顶级盛会之一国际计算机视觉与模式识别会议（CVPR 2023）将在加拿大温哥华举行。CVPR（Computer Vision and Pattern Recognition，计算机视觉与模式识别）会议是计算机视觉与模式识别、人工智能领域的国际顶级会议，是中国计算机学会（CCF）推荐的A类国际学术会议。本届会议录用率为25.78%。

来自Tel Aviv University, Google，Hugging Face给出了《视觉中的理解和解释注意力》教程，值得关注！

在这个教程中，我们探讨了在视觉中使用注意力的方法。从左到右：(i) 注意力可以用来解释模型的预测（例如，对于图像-文本对的CLIP）(ii) 探测基于注意力模型的示例 (iii) 多模态模型的交叉注意力图可以用来指导生成模型（例如，缓解在稳定扩散中的忽视）。

Overview of typical ways of interpreting CNNs - GradCAM, LRP, grad x input, SHAP, LIME, Integrated Gradients

Transformers 导论 Short introduction to Transformers

The attention mechanism Positional encoding Integrating attention maps from different modalities via cross-attention Probing Transformers: understanding what Transformers learn from images [1, 2, 3, 4, 5]

Mean attention distance (relative receptive field) Centered kernel alignment The role of skip connections Why do we need different methods to interpret Transformers?

Is attention an explanation? If so, under what conditions? Explaining predictions made by Transformers (XAI) Algorithms to explain attention

Attention rollout [6] Attention flow (shortly since it’s computationally expensive) [6] Transformer Interpretability Beyond Attention Visualization [7] Understanding what Transformers learn from multiple modalities [8] Attention as explanation

Class attention [10] Attention for semantic segmentation (DINO, [11]) Leveraging attention for downstream tasks

Ron Mokady sharing his seminal research on employing attention for text-based image editing ([12, 13]) Using cross-attention to guide text-to-image generation models ([14]) Open questions

How do we evaluate these explainability methods? Are smaller Transformer models better than the larger ones as far as explainability is concerned? Is attention a good way to interpret Transformers in the first place? Why do methods for CNNs not perform well on Transformers? (e.g. the adaptation of GradCAM to Transformers does not seem to work, even though the intuition is rather similar) * Conclusion and Q&A

参考文献:

[1] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, Chefer et al. [2] Do Vision Transformers See Like Convolutional Neural Networks?, Raghu et al. [3] What do Vision Transformers Learn? A Visual Exploration, Ghiasi et al.[4] Quantifying Attention Flow in Transformers, Abnar et al. [5] Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models, Chefer et al. [6] Prompt-to-Prompt Image Editing with Cross-Attention Control, Hertz et al. [7] NULL-text Inversion for Editing Real Images using Guided Diffusion Models, Mokady et al.

成为VIP会员查看完整内容

相关内容

CVPR 2023

关注 15

CVPR 2023大会将于 6 月 18 日至 22 日在温哥华会议中心举行。CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写，即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议，会议的主要内容是计算机视觉与模式识别技术。 CVPR 2023 共收到 9155 份提交，比去年增加了 12%，创下新纪录，今年接收了 2360 篇论文，接收率为 25.78%。作为对比，去年有 8100 多篇有效投稿，大会接收了 2067 篇，接收率为 25%。

大模型如何做检索增强？ACL2023开会了！陈丹琦等最新《基于检索的大语言模型及其应用》教程，400多页PPT

专知会员服务

111+阅读 · 2023年7月9日

视觉大模型有何进展？微软CVPR2023最新《视觉基础模型进展》教程，附260页ppt

专知会员服务

117+阅读 · 2023年6月22日

CVPR 2023开会了！SMU谷歌等最新《视觉异常检测》教程，附300多页ppt

专知会员服务

69+阅读 · 2023年6月20日

【AAAI2023教程】大规模深度学习优化技术，109页ppt

专知会员服务

61+阅读 · 2023年2月10日