令牌合并以实现快速稳定扩散 (Token Merging for Fast Stable Diffusion) - 专知论文

会员服务 ·

0

稳定扩散 · 令牌 · 扩散模型 · 冗余 · 图像生成 ·

2023 年 3 月 30 日

Token Merging for Fast Stable Diffusion

翻译：令牌合并以实现快速稳定扩散

Daniel Bolya,Judy Hoffman

from arxiv, Check out the code at https://github.com/dbolya/tomesd

The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

翻译：图像生成的领域被开放词汇扩散模型彻底改变了。然而，在核心部分，这些模型使用变压器，这使得生成变慢。已经出现了更好的实现来增加这些变压器的吞吐量，但它们仍然评估整个模型。在本文中，我们通过利用生成图像中的自然冗余来合并冗余令牌来加速扩散模型。在对ToMe进行了一些扩散特定的改进之后，我们的用于稳定扩散的ToMe可以将现有稳定扩散模型中的令牌数量减少高达60％，同时仍能产生高质量图像而无需任何额外的训练。在这个过程中，我们将图像生成加速了高达2倍，并将内存消耗降低了高达5.6倍。此外，这种加速与诸如xFormers之类的高效实现叠加在一起，对于大型图像，速度提高了5.4倍，而对质量的影响最小。代码可在https://github.com/dbolya/tomesd中获得。

0

相关内容

稳定扩散

5400亿！谷歌「Pathways语言模型」发布，能理解做推理生成代码

5400亿！谷歌「Pathways语言模型」发布，能理解做推理生成代码

专知会员服务

40+阅读 · 2022年4月5日

【UNC-Peter Hase】自然语言处理中的可解释机器学习:方法与评估，34页ppt

【UNC-Peter Hase】自然语言处理中的可解释机器学习:方法与评估，34页ppt

专知会员服务

36+阅读 · 2022年3月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

机器之心

1+阅读 · 2022年9月20日

苹果M1芯片上运行Stable Diffusion，生成图片只需15秒，几步搞定

苹果M1芯片上运行Stable Diffusion，生成图片只需15秒，几步搞定

机器之心

4+阅读 · 2022年9月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

基于GPU的几类分数阶微分方程的并行算法研究及其实现

国家自然科学基金

0+阅读 · 2015年12月31日

BiFeO3基多铁性薄膜异质结中尺寸相关逆磁电效应的相场模拟

国家自然科学基金

0+阅读 · 2014年12月31日

Serp-2 调控apoptosis和pyroptosis 对肝脏缺血再灌注损伤的保护作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶微分方程并行算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

波长交错高采样率高精度光电模数转换器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于非参数层次贝叶斯模型的自适应字典稀疏表示方法及应用

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

强磁场诱导Bi2FeCrO6多铁性材料的制备及性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Enhancing Few-shot NER with Prompt Ordering based Data Augmentation

Arxiv

1+阅读 · 2023年5月19日

SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

Arxiv

0+阅读 · 2023年5月19日

T-former: An Efficient Transformer for Image Inpainting

Arxiv

0+阅读 · 2023年5月19日

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Arxiv

0+阅读 · 2023年5月18日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

Arxiv

0+阅读 · 2023年5月17日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

相关VIP内容

5400亿！谷歌「Pathways语言模型」发布，能理解做推理生成代码

5400亿！谷歌「Pathways语言模型」发布，能理解做推理生成代码

专知会员服务

40+阅读 · 2022年4月5日

【UNC-Peter Hase】自然语言处理中的可解释机器学习:方法与评估，34页ppt

【UNC-Peter Hase】自然语言处理中的可解释机器学习:方法与评估，34页ppt

专知会员服务

36+阅读 · 2022年3月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

基于Tensorflow、Keras实现Stable Diffusion，开箱即用实现多GPU推理

机器之心

1+阅读 · 2022年9月20日

苹果M1芯片上运行Stable Diffusion，生成图片只需15秒，几步搞定

苹果M1芯片上运行Stable Diffusion，生成图片只需15秒，几步搞定

机器之心

4+阅读 · 2022年9月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Enhancing Few-shot NER with Prompt Ordering based Data Augmentation

Arxiv

1+阅读 · 2023年5月19日

SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

Arxiv

0+阅读 · 2023年5月19日

T-former: An Efficient Transformer for Image Inpainting

Arxiv

0+阅读 · 2023年5月19日

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Arxiv

0+阅读 · 2023年5月18日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

Arxiv

0+阅读 · 2023年5月17日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

基于GPU的几类分数阶微分方程的并行算法研究及其实现

国家自然科学基金

0+阅读 · 2015年12月31日

BiFeO3基多铁性薄膜异质结中尺寸相关逆磁电效应的相场模拟

国家自然科学基金

0+阅读 · 2014年12月31日

Serp-2 调控apoptosis和pyroptosis 对肝脏缺血再灌注损伤的保护作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶微分方程并行算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

波长交错高采样率高精度光电模数转换器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于非参数层次贝叶斯模型的自适应字典稀疏表示方法及应用

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

强磁场诱导Bi2FeCrO6多铁性材料的制备及性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员