改进的基于扩散的图像上色方法：利用附带模型实现 (Improved Diffusion-based Image Colorization via Piggybacked Models) - 专知论文

会员服务 ·

0

模型实现 · 扩散模型 · 潜在 · 信息转移 · 语义对象 ·

2023 年 4 月 21 日

Improved Diffusion-based Image Colorization via Piggybacked Models

翻译：改进的基于扩散的图像上色方法：利用附带模型实现

Hanyuan Liu,Jinbo Xing,Minshan Xie,Chengze Li,Tien-Tsin Wong

from arxiv, project page: https://piggyback-color.github.io/

Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.

翻译：图像上色技术在过去几十年一直是研究界热议的话题。然而，由于缺乏类人的整体颜色理解能力，现有方法仍然无法提供令人满意的彩色效果。最近，研究者开始利用大规模文本到图像（Text-to-Image，T2I）模型将从文本输入中提取的语义信息转移到图像中，其中文本提供了全局控制语义对象在图像中的位置的手段。本文介绍了一种基于附带于 T2I 扩散模型之上的上色模型。我们的主要思想是利用预先训练的 T2I 扩散模型中的颜色先验知识以实现写实且多样化的上色效果。我们设计了一个扩散引导器，以整合潜在扩散模型的预训练权重，输出符合灰度输入的视觉语义的潜在颜色先验。然后，我们运用一种轻度感知的矢量量化自编码器生成精准对齐给定灰度图像的上色结果。我们的模型还能够实现带附加输入（例如用户提示和文本）的有条件化上色。大量实验表明，我们的方法在感知质量方面达到了最先进水平。

0

相关内容

模型实现

【CVPR2023】面向自监督视觉表示学习的混合自编码器

【CVPR2023】面向自监督视觉表示学习的混合自编码器

专知会员服务

25+阅读 · 2023年4月3日

KDD 2022 | GraphMAE:自监督掩码图自编码器

KDD 2022 | GraphMAE:自监督掩码图自编码器

专知会员服务

20+阅读 · 2022年7月14日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

SRGAN论文笔记

SRGAN论文笔记

统计学习与视觉计算组

109+阅读 · 2018年4月12日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

基于机器视觉的索缆六自由度位移测量方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

旋转弯曲微动疲劳裂纹萌生及扩展机理和寿命预测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于纹理的高质量非稳定场可视化研究

国家自然科学基金

0+阅读 · 2013年12月31日

读解非文字的文化遗产史学——20世纪初西方学者对中国建筑调查历史图像资料研究

国家自然科学基金

0+阅读 · 2012年12月31日

通过Diels-Alder反应和多重氢键构筑具有自修复功能的PCL-PEG可逆交联网络及其三重形状记忆行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于庞加莱对偶的三维自由拓扑模型

国家自然科学基金

1+阅读 · 2009年12月31日

TGF-βsmads信号通路对失神经骨骼肌纤维化调控机制的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

Arxiv

0+阅读 · 2023年6月6日

Benchmarking Spatial Relationships in Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月6日

AutoScrum: Automating Project Planning Using Large Language Models

Arxiv

0+阅读 · 2023年6月5日

Interactive Editing for Text Summarization

Arxiv

0+阅读 · 2023年6月5日

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Fluorescence Microscopy Image Segmentation Using Convolutional Neural Network With Generative Adversarial Networks

Arxiv

18+阅读 · 2018年1月22日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】面向自监督视觉表示学习的混合自编码器

【CVPR2023】面向自监督视觉表示学习的混合自编码器

专知会员服务

25+阅读 · 2023年4月3日

KDD 2022 | GraphMAE:自监督掩码图自编码器

KDD 2022 | GraphMAE:自监督掩码图自编码器

专知会员服务

20+阅读 · 2022年7月14日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

SRGAN论文笔记

SRGAN论文笔记

统计学习与视觉计算组

109+阅读 · 2018年4月12日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

Arxiv

0+阅读 · 2023年6月6日

Benchmarking Spatial Relationships in Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月6日

AutoScrum: Automating Project Planning Using Large Language Models

Arxiv

0+阅读 · 2023年6月5日

Interactive Editing for Text Summarization

Arxiv

0+阅读 · 2023年6月5日

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Fluorescence Microscopy Image Segmentation Using Convolutional Neural Network With Generative Adversarial Networks

Arxiv

18+阅读 · 2018年1月22日

相关基金

基于机器视觉的索缆六自由度位移测量方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

旋转弯曲微动疲劳裂纹萌生及扩展机理和寿命预测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于纹理的高质量非稳定场可视化研究

国家自然科学基金

0+阅读 · 2013年12月31日

读解非文字的文化遗产史学——20世纪初西方学者对中国建筑调查历史图像资料研究

国家自然科学基金

0+阅读 · 2012年12月31日

通过Diels-Alder反应和多重氢键构筑具有自修复功能的PCL-PEG可逆交联网络及其三重形状记忆行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于庞加莱对偶的三维自由拓扑模型

国家自然科学基金

1+阅读 · 2009年12月31日

TGF-βsmads信号通路对失神经骨骼肌纤维化调控机制的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员