k- means 遮罩变换器 (k-means Mask Transformer) - 专知论文

会员服务 ·

0

Learning · 变换 · state-of-the-art · Vision · 掩码 ·

2022 年 7 月 8 日

k-means Mask Transformer

翻译：k- means 遮罩变换器

Qihang Yu,Huiyu Wang,Siyuan Qiao,Maxwell Collins,Yukun Zhu,Hatwig Adam,Alan Yuille,Liang-Chieh Chen

from arxiv, ECCV 2022. Codes and models are available at https://github.com/google-research/deeplab2

The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Processing (NLP), transformer architectures, consisting of self-attention and cross-attention, effectively learn long-range interactions between elements in a sequence. However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features. This subsequently impedes the learning in cross-attention between pixel features and object queries. In this paper, we rethink the relationship between pixels and object queries and propose to reformulate the cross-attention learning as a clustering process. Inspired by the traditional k-means clustering algorithm, we develop a k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which not only improves the state-of-the-art, but also enjoys a simple and elegant design. As a result, our kMaX-DeepLab achieves a new state-of-the-art performance on COCO val set with 58.0% PQ, and Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5% mIoU without test-time augmentation or external dataset. We hope our work can shed some light on designing transformers tailored for vision tasks. Code and models are available at https://github.com/google-research/deeplab2

翻译：变压器在视觉任务中的崛起不仅提升了网络主干设计,而且开始了一个崭新页面,以实现端到端图像识别(例如,物体探测和全光分割)。来自自然语言处理(NLP)的变压器结构,由自我注意和交叉注意组成,有效地学习各元素之间按顺序进行远程互动。然而,我们看到,大多数基于变压器的视觉模型只是借用NLP的理念,忽略了语言和图像之间的关键差异,特别是空间平整像素特性的超大序列长度。这随后阻碍了在像素特性和对象查询之间的交叉注意学习。在本文件中,我们重新思考像素和对象查询之间的关系,并提议重新将交叉注意学习作为一个组合过程。受传统的 k means 组合算法的启发,我们为分解任务开发了 kpoint Mask Xfor (kMaX-DescausalLab) (kMax-DevelopLab), 不仅改进了空间平坦平坦平坦的像素模型,而且还实现了一种简单和优雅度的 Cal-al-laxalal-la la la-deal-deal-la maxal-la maxal maxal) maxal maxal maxal maxal maxal maxalalalalalalalalalalalalalalalalalalalalal Qal maxal maxal maxal maxal maxalal masal maxal maxal 。

0

相关内容

Learning

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【ICLR2022】UniFormer：无缝集成 Transformer，更高效的时空表征学习框架

【ICLR2022】UniFormer：无缝集成 Transformer，更高效的时空表征学习框架

专知会员服务

50+阅读 · 2022年2月16日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

内质网应激通过JAZF1/AKT/mTOR信号途径调控巨噬细胞自噬影响易损斑块的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型给受体共轭聚合物半导体材料的设计、绿色合成与光电性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Rac1信号通路在糖尿病肾病足细胞转分化中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

APE1在高草酸诱导肾小管上皮细胞线粒体氧化应激损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

有理曲面及其相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

哈密顿系统与辛几何中的闭轨道

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血方通过"DAMPs-PRRs-巨噬细胞"途径影响动脉粥样硬化斑块易损性的机制

国家自然科学基金

0+阅读 · 2011年12月31日

生物分子连续模型中的数值方法与程序实现

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

A Circular Window-based Cascade Transformer for Online Action Detection

Arxiv

0+阅读 · 2022年8月30日

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Arxiv

0+阅读 · 2022年8月26日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【ICLR2022】UniFormer：无缝集成 Transformer，更高效的时空表征学习框架

【ICLR2022】UniFormer：无缝集成 Transformer，更高效的时空表征学习框架

专知会员服务

50+阅读 · 2022年2月16日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

A Circular Window-based Cascade Transformer for Online Action Detection

Arxiv

0+阅读 · 2022年8月30日

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Arxiv

0+阅读 · 2022年8月26日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

内质网应激通过JAZF1/AKT/mTOR信号途径调控巨噬细胞自噬影响易损斑块的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型给受体共轭聚合物半导体材料的设计、绿色合成与光电性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Rac1信号通路在糖尿病肾病足细胞转分化中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

APE1在高草酸诱导肾小管上皮细胞线粒体氧化应激损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

有理曲面及其相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

哈密顿系统与辛几何中的闭轨道

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血方通过"DAMPs-PRRs-巨噬细胞"途径影响动脉粥样硬化斑块易损性的机制

国家自然科学基金

0+阅读 · 2011年12月31日

生物分子连续模型中的数值方法与程序实现

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员