StrulerDALLE: 语言指导样式转让,使用大比例生成模型的矢量量化控制器</s> (StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model) - 专知论文

会员服务 ·

0

词元分析器 · MoDELS · 生成模型 · HTTPS · SimPLe ·

2023 年 3 月 16 日

StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

翻译：StrulerDALLE: 语言指导样式转让,使用大比例生成模型的矢量量化控制器

Zipeng Xu,Enver Sangineto,Nicu Sebe

Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter-specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation, i.e., from input content image to output stylized image, in the discrete latent space of a large-scale pretrained vector-quantized tokenizer. To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision that ensures stylization and content preservation simultaneously. Experimental results demonstrate the superiority of our method, which can effectively transfer art styles using language instructions at different granularities. Code is available at https://github.com/zipengxuc/StylerDALLE.

翻译：尽管在风格转换任务方面取得了进展,但大多数先前的工作重心只是相对简单的特征,如颜色或纹理等,而忽略了更抽象的概念,如总体艺术表达或画家特质等。然而,这些抽象的语义可以被DALL-E或CLIP等模型所捕捉,这些模型经过了使用图像和文本文档的巨大数据集的培训。在本文中,我们提出了StyrrDALLE,这是一个利用这些模型和自然语言来描述抽象艺术风格的风格转换方法。具体地说,我们把语言指导风格传输任务设计成一种非不显性符号序列翻译,即从输入内容图像到输出模缩化图像,在大规模预先训练的矢量代号的离散潜在空间里。为了纳入样式信息,我们提议与基于CLIP的语言监督一起实施强化学习战略,以确保同步和内容保存。实验结果展示了我们方法的优越性,可以在不同颗粒度上使用语言指令有效地传输艺术风格。代码可在 https://glius/Emplyusub./spencomcodedeal.</s>

0

相关内容

词元分析器

词元分析器

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Maresin1 调控巨噬细胞表型转化在肺损伤中的作用及其机制

国家自然科学基金

0+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PARP活性对人胚胎干细胞向视网膜色素上皮细胞定向分化的作用

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化酶复合物COMPASS催化的H3K4me2,H3K4me3对果蝇发育调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

8细胞胚胎紧密化中多个信号通路对Ezrin磷酸化的调控

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白异构酶Pin1对视网膜母细胞瘤蛋白Rb的功能调控

国家自然科学基金

0+阅读 · 2011年12月31日

诱导AICD提前发生的机制与fraxinellone的免疫抑制作用

国家自然科学基金

0+阅读 · 2009年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

白念珠菌凋亡相关新基因IPF4847的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Arxiv

1+阅读 · 2023年5月8日

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Arxiv

0+阅读 · 2023年5月5日

Data Curation for Image Captioning with Text-to-Image Generative Models

Arxiv

0+阅读 · 2023年5月5日

LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics

Arxiv

0+阅读 · 2023年5月4日

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Arxiv

0+阅读 · 2023年5月4日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

相关论文

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Arxiv

1+阅读 · 2023年5月8日

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Arxiv

0+阅读 · 2023年5月5日

Data Curation for Image Captioning with Text-to-Image Generative Models

Arxiv

0+阅读 · 2023年5月5日

LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics

Arxiv

0+阅读 · 2023年5月4日

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Arxiv

0+阅读 · 2023年5月4日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

相关基金

Maresin1 调控巨噬细胞表型转化在肺损伤中的作用及其机制

国家自然科学基金

0+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PARP活性对人胚胎干细胞向视网膜色素上皮细胞定向分化的作用

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化酶复合物COMPASS催化的H3K4me2,H3K4me3对果蝇发育调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

8细胞胚胎紧密化中多个信号通路对Ezrin磷酸化的调控

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白异构酶Pin1对视网膜母细胞瘤蛋白Rb的功能调控

国家自然科学基金

0+阅读 · 2011年12月31日

诱导AICD提前发生的机制与fraxinellone的免疫抑制作用

国家自然科学基金

0+阅读 · 2009年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

白念珠菌凋亡相关新基因IPF4847的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员