HRFormer:高分辨率高密度预测变异器 (HRFormer: High-Resolution Transformer for Dense Prediction) - 专知论文

会员服务 ·

0

变换 · 估计/估计量 · 可交换的 · HRNet · INFORMS ·

2021 年 10 月 21 日

HRFormer: High-Resolution Transformer for Dense Prediction

翻译：HRFormer:高分辨率高密度预测变异器

Yuhui Yuan,Rao Fu,Lang Huang,Weihong Lin,Chao Zhang,Xilin Chen,Jingdong Wang

from arxiv, Accepted at NeurIPS 2021

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet), along with local-window self-attention that performs self-attention over small non-overlapping image windows, for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks, e.g., HRFormer outperforms Swin transformer by $1.3$ AP on COCO pose estimation with $50\%$ fewer parameters and $30\%$ fewer FLOPs. Code is available at: https://github.com/HRNet/HRFormer.

翻译：我们推出高分辨率变压器(HRFormer),该变压器在密集的预测任务中学习高清晰度表示,而原先的愿景变压器则产生低清晰度表示,并具有很高的内存和计算成本。我们利用高分辨率变压网络(HRNet)引入的多分辨率平行设计,同时利用对小型非重叠图像窗口进行自控的本地窗口自控,以提高记忆和计算效率。此外,我们还引入了向新生力量的演进,以在断开的图像窗口之间交流信息。我们展示了高分辨率变压器在人类面貌估计和语义分解任务上的有效性,例如,HRFormer公司在COCOCO上以1.3美元取代Swin变压器,其估计值减少50美元,FLOPs则减少30美元。代码见:https://github.com/HRNet/HRFormer。

0

相关内容

Swin Transformer重磅升级！Swin V2：向更大容量、更高分辨率的更大模型迈进

Swin Transformer重磅升级！Swin V2：向更大容量、更高分辨率的更大模型迈进

专知会员服务

26+阅读 · 2021年11月20日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

29+阅读 · 2021年7月30日

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

专知会员服务

88+阅读 · 2020年12月21日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

23+阅读 · 2020年7月28日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

16+阅读 · 2020年3月29日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

43+阅读 · 2020年3月26日

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

专知会员服务

21+阅读 · 2020年3月17日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

116+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

图像分类论文与代码大列表

图像分类论文与代码大列表

专知

6+阅读 · 2019年2月16日

【泡泡一分钟】一种实用且高效的多视图匹配方法

【泡泡一分钟】一种实用且高效的多视图匹配方法

泡泡机器人SLAM

6+阅读 · 2018年11月19日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

从R-CNN到Mask R-CNN

从R-CNN到Mask R-CNN

机器学习研究会

25+阅读 · 2017年11月13日

RegionViT: Regional-to-Local Attention for Vision Transformers

Arxiv

1+阅读 · 2021年12月16日

Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

Arxiv

7+阅读 · 2021年11月18日

Colorization Transformer

Arxiv

9+阅读 · 2021年2月8日

Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting

Arxiv

4+阅读 · 2020年12月14日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

Arxiv

6+阅读 · 2019年5月23日

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Arxiv

4+阅读 · 2019年2月27日

Deep High-Resolution Representation Learning for Human Pose Estimation

Arxiv

5+阅读 · 2019年2月25日

Arxiv

7+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

Swin Transformer重磅升级！Swin V2：向更大容量、更高分辨率的更大模型迈进

Swin Transformer重磅升级！Swin V2：向更大容量、更高分辨率的更大模型迈进

专知会员服务

26+阅读 · 2021年11月20日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

29+阅读 · 2021年7月30日

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

专知会员服务

88+阅读 · 2020年12月21日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

23+阅读 · 2020年7月28日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

16+阅读 · 2020年3月29日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

43+阅读 · 2020年3月26日

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

专知会员服务

21+阅读 · 2020年3月17日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

116+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

热门VIP内容

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

图像分类论文与代码大列表

图像分类论文与代码大列表

专知

6+阅读 · 2019年2月16日

【泡泡一分钟】一种实用且高效的多视图匹配方法

【泡泡一分钟】一种实用且高效的多视图匹配方法

泡泡机器人SLAM

6+阅读 · 2018年11月19日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

从R-CNN到Mask R-CNN

从R-CNN到Mask R-CNN

机器学习研究会

25+阅读 · 2017年11月13日

相关论文

RegionViT: Regional-to-Local Attention for Vision Transformers

Arxiv

1+阅读 · 2021年12月16日

Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

Arxiv

7+阅读 · 2021年11月18日

Colorization Transformer

Arxiv

9+阅读 · 2021年2月8日

Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting

Arxiv

4+阅读 · 2020年12月14日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

Arxiv

6+阅读 · 2019年5月23日

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Arxiv

4+阅读 · 2019年2月27日

Deep High-Resolution Representation Learning for Human Pose Estimation

Arxiv

5+阅读 · 2019年2月25日

Arxiv

7+阅读 · 2018年1月24日

微信扫码咨询专知VIP会员