SWAT: 托肯内部和其中的空间结构 (SWAT: Spatial Structure Within and Among Tokens) - 专知论文

会员服务 ·

0

词元分析器 · MoDELS · DeiT · 成对型 · 混合 ·

2021 年 11 月 26 日

SWAT: Spatial Structure Within and Among Tokens

翻译：SWAT: 托肯内部和其中的空间结构

Kumara Kahatapitiya,Michael S. Ryoo

Modeling visual data as tokens (i.e., image patches), and applying attention mechanisms or feed-forward networks on top of them has shown to be highly effective in recent years. The common pipeline in such approaches includes a tokenization method, followed by a set of layers/blocks for information mixing, both within tokens and among tokens. In common practice, image patches are flattened when converted into tokens, discarding the spatial structure within each patch. Next, a module such as multi-head self-attention captures the pairwise relations among the tokens and mixes them. In this paper, we argue that models can have significant gains when spatial structure is preserved in tokenization, and is explicitly used in the mixing stage. We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing, both of which can be combined with existing models with minimal effort. We introduce a family of models (SWAT), showing improvements over the likes of DeiT, MLP-Mixer and Swin Transformer, across multiple benchmarks including ImageNet classification and ADE20K segmentation. Our code and models will be released online.

翻译：将视觉数据建模成象征物(即图像补丁),以及将关注机制或饲料向前网络加到上面,这些方法的常见管道近年来证明非常有效。这些方法的共同管道包括象征性化方法,其次为一组层/区块,在象征物内和象征物间混合信息。在通常做法中,图像补丁在转换成象征物时被固定,丢弃每个补丁内的空间结构。接下来,多头自留式模块等模块捕捉了象征物和混合物之间的对称关系。在本文中,我们指出,当空间结构在象征性化中保存时,模型可以取得重大收益,并在混合阶段明确使用。我们提出了两项关键贡献:(1) 结构自觉化和(2) 结构自觉混合,两者都可以与现有的模型合并,但努力很小。我们引入了一组模型,显示在DeiT、MLP-Mixer和Swin变形器等多个基准中,包括图像网络分类和ADE20-ADE20断段模型,我们的代码和模型将被发布。

0

相关内容

词元分析器

词元分析器

【Hinton新论文】语言建模目标检测Pix2seq

【Hinton新论文】语言建模目标检测Pix2seq

专知会员服务

26+阅读 · 2021年9月23日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

自动结构变分推理，Automatic structured variational inference

自动结构变分推理，Automatic structured variational inference

专知会员服务

41+阅读 · 2020年2月10日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

LibRec 精选：从0开始构建RNN网络

LibRec 精选：从0开始构建RNN网络

LibRec智能推荐

5+阅读 · 2019年5月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Self-supervised Graph-level Representation Learning with Local and Global Structure

Arxiv

9+阅读 · 2021年6月8日

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Arxiv

6+阅读 · 2020年3月24日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Arxiv

6+阅读 · 2018年1月24日

Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking

Arxiv

5+阅读 · 2018年1月7日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【Hinton新论文】语言建模目标检测Pix2seq

【Hinton新论文】语言建模目标检测Pix2seq

专知会员服务

26+阅读 · 2021年9月23日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

自动结构变分推理，Automatic structured variational inference

自动结构变分推理，Automatic structured variational inference

专知会员服务

41+阅读 · 2020年2月10日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

用于无人机的C波段空地通信系统研究 | 2025最新116页

甚高频军事战术通信系统传播性能分析研究

军事通信系统：安全行动的支柱

卫星与地面通信系统：美陆军面临的空间与电子战局势 | 39页报告

相关资讯

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

LibRec 精选：从0开始构建RNN网络

LibRec 精选：从0开始构建RNN网络

LibRec智能推荐

5+阅读 · 2019年5月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Self-supervised Graph-level Representation Learning with Local and Global Structure

Arxiv

9+阅读 · 2021年6月8日

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Arxiv

6+阅读 · 2020年3月24日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Arxiv

6+阅读 · 2018年1月24日

Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking

Arxiv

5+阅读 · 2018年1月7日

微信扫码咨询专知VIP会员