全部都是有价值的单词: 用于扩散模型的 Vit 后骨</s> (All are Worth Words: A ViT Backbone for Diffusion Models) - 专知论文

会员服务 ·

0

U-Net · MoDELS · Backbone · 跳跃连接 · Vision ·

2023 年 3 月 6 日

All are Worth Words: A ViT Backbone for Diffusion Models

翻译：全部都是有价值的单词: 用于扩散模型的 Vit 后骨

Fan Bao,Shen Nie,Kaiwen Xue,Yue Cao,Chongxuan Li,Hang Su,Jun Zhu

Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant in diffusion models. We design a simple and general ViT-based architecture (named U-ViT) for image generation with diffusion models. U-ViT is characterized by treating all inputs including the time, condition and noisy image patches as tokens and employing long skip connections between shallow and deep layers. We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size. In particular, latent diffusion models with U-ViT achieve record-breaking FID scores of 2.29 in class-conditional image generation on ImageNet 256x256, and 5.48 in text-to-image generation on MS-COCO, among methods without accessing large external datasets during the training of generative models. Our results suggest that, for diffusion-based image modeling, the long skip connection is crucial while the down-sampling and up-sampling operators in CNN-based U-Net are not always necessary. We believe that U-ViT can provide insights for future research on backbones in diffusion models and benefit generative modeling on large scale cross-modality datasets.

翻译：视觉变异器(ViT)在各种愿景任务中显示出希望,而基于革命神经网络(CNN)的U-Net在传播模型中仍然占据主导地位。我们设计了一个简单和通用的VIT基础建筑(名为U-ViT),用于以扩散模型生成图像。U-ViT的特点是将所有投入(包括时间、状况和噪音图像补丁)作为代号处理,并采用浅层和深层之间长期跳过连接的方法。我们评估U-ViT在无条件和低级图像生成中,以及文本到图像生成中,U-ViT即使不优于类似规模的CNN U-Net,也具有可比性。特别是,与U-ViT的潜伏传播模型实现了2.29分的破纪录性FID分数,在SimageNet 256和MS-CO的文本到图像生成中实现了2.48分分级,这是在基因模型培训期间无法访问大型外部模型。我们的结果表明,基于传播图像建模模型的长跳连接十分关键,而U-BIS-S-Simpealx-hial模型则始终不必要地展示U-hial-hial-hial-hi-hi-hi-hilling Stem-hi-hilling Stem-shipal-hi-hi-hi-shipal-shipal-shipal-hi-hi-shipal besmalsm-ship-shipal besm-shipalismalsmmmm-shipalsmalsalsmalsmmalsm-sm-shipalsm-sm-sm-sm-shipal-sm-sm-shipal-shipalsm-smmmmmm-shipal-s-shipals-sxalsm-sm-smmm-sm-smal-sal-sal-sal-smmmmmmmmmmmmm-s-s-sm-s-sal-sal-sxal-sal-sal-sal-sal-sxal-sxy-sal-sal-sil</s>

0

相关内容

U-Net

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

鸡功能候选基因拷贝数变异、单核苷酸多态等对鸡重要经济性状的综合影响及其分子调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

类风湿性关节炎全基因组甲基化调控网络及功能机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

FGF-2信号通路介导EGFR-TKIs快速获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HRR通路miRNA及其靶序列SNP与乳腺癌遗传易感性关联分析及功能论证

国家自然科学基金

0+阅读 · 2012年12月31日

甘露糖受体家族蛋白的结构生物学研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

原癌基因AEG-1调控胶质瘤细胞凋亡的生物学功能及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

肝泡球蚴病灶生物学边界的影像与病理对照研究

国家自然科学基金

1+阅读 · 2009年12月31日

Vision Transformers with Mixed-Resolution Tokenization

Arxiv

0+阅读 · 2023年4月27日

Controllable Image Generation via Collage Representations

Arxiv

0+阅读 · 2023年4月26日

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Arxiv

1+阅读 · 2023年4月26日

Single-View Height Estimation with Conditional Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年4月26日

STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation

Arxiv

0+阅读 · 2023年4月25日

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet

Arxiv

0+阅读 · 2023年4月25日

Diffusion Models in Vision: A Survey

Arxiv

29+阅读 · 2022年9月10日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Vision Transformers with Mixed-Resolution Tokenization

Arxiv

0+阅读 · 2023年4月27日

Controllable Image Generation via Collage Representations

Arxiv

0+阅读 · 2023年4月26日

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Arxiv

1+阅读 · 2023年4月26日

Single-View Height Estimation with Conditional Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年4月26日

STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation

Arxiv

0+阅读 · 2023年4月25日

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet

Arxiv

0+阅读 · 2023年4月25日

Diffusion Models in Vision: A Survey

Arxiv

29+阅读 · 2022年9月10日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

相关基金

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

鸡功能候选基因拷贝数变异、单核苷酸多态等对鸡重要经济性状的综合影响及其分子调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

类风湿性关节炎全基因组甲基化调控网络及功能机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

FGF-2信号通路介导EGFR-TKIs快速获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HRR通路miRNA及其靶序列SNP与乳腺癌遗传易感性关联分析及功能论证

国家自然科学基金

0+阅读 · 2012年12月31日

甘露糖受体家族蛋白的结构生物学研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

原癌基因AEG-1调控胶质瘤细胞凋亡的生物学功能及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

肝泡球蚴病灶生物学边界的影像与病理对照研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员