多模态服装设计师：以人为中心的潜在扩散模型用于时尚图像编辑 (Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing) - 专知论文

会员服务 ·

0

服装 · 多模 · 多模态 · 模态 · 潜在 ·

2023 年 4 月 4 日

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

翻译：多模态服装设计师：以人为中心的潜在扩散模型用于时尚图像编辑

Alberto Baldrati,Davide Morelli,Giuseppe Cartella,Marcella Cornia,Marco Bertini,Rita Cucchiara

Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations will be publicly released at: https://github.com/aimagelab/multimodal-garment-designer.

翻译：时尚插图被设计师用于传达他们的愿景，从概念化到实现，展示服装与人体的互动。在此背景下，计算机视觉可以用于改进时尚设计过程。与以前主要侧重于服装虚拟试穿的作品不同，我们提出多模态有条件的时尚图像编辑任务，通过遵循多模态提示（例如文本，人体姿势和服装草图）来指导生成以人为中心的时尚图像。我们通过提出一种基于潜在扩散模型的新架构来解决这个问题，这种方法在时尚领域中尚未被使用过。由于目前缺乏适合该任务的现有数据集，我们还通过半自动方式扩展了两个现有的时尚数据集，即Dress Code和VITON-HD，进行多模态注释。对这些新数据集的实验结果证明了我们的方案的有效性，无论从真实性还是与给定的多模态输入的一致性方面都如此。源代码和收集的多模态注释将在以下地址公开发布：https://github.com/aimagelab/multimodal-garment-designer。

0

相关内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

专知会员服务

20+阅读 · 2022年3月9日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

从50亿图文中提取中文跨模态新基准Zero，奇虎360全新预训练框架超越多项SOTA

从50亿图文中提取中文跨模态新基准Zero，奇虎360全新预训练框架超越多项SOTA

PaperWeekly

0+阅读 · 2022年6月11日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

基于三级供应链的质量与需求不对称下制造商激励契约设计研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

新型胺脱氢酶催化酮不对称还原胺化的分子机制探索

国家自然科学基金

0+阅读 · 2014年12月31日

顾客社区中的顾客契合行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型端粒酶TERT抑制剂：手性吡唑-香豆素-色酮新骨架的优化设计合成及构效关系

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

沙尘暴虚拟体验中心

国家自然科学基金

0+阅读 · 2010年12月31日

计算机模拟蒽环类抗癌药物与DNA相互作用的分子对接模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

海洋天然产物Eudistomin衍生物的设计、合成及抗乙肝病毒构效关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型手性N-Oxide金属化合物的合成与催化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Short and Straight: Geodesics on Differentiable Manifolds

Arxiv

0+阅读 · 2023年5月24日

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

Better speech synthesis through scaling

Arxiv

0+阅读 · 2023年5月23日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Brain Captioning: Decoding human brain activity into images and text

Arxiv

0+阅读 · 2023年5月19日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月19日

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Arxiv

0+阅读 · 2023年5月19日

Recovering 3D Human Mesh from Monocular Images: A Survey

Arxiv

12+阅读 · 2022年3月8日

VIP会员

文章信息

相关主题

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

专知会员服务

20+阅读 · 2022年3月9日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

从50亿图文中提取中文跨模态新基准Zero，奇虎360全新预训练框架超越多项SOTA

从50亿图文中提取中文跨模态新基准Zero，奇虎360全新预训练框架超越多项SOTA

PaperWeekly

0+阅读 · 2022年6月11日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Short and Straight: Geodesics on Differentiable Manifolds

Arxiv

0+阅读 · 2023年5月24日

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

Better speech synthesis through scaling

Arxiv

0+阅读 · 2023年5月23日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Brain Captioning: Decoding human brain activity into images and text

Arxiv

0+阅读 · 2023年5月19日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月19日

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Arxiv

0+阅读 · 2023年5月19日

Recovering 3D Human Mesh from Monocular Images: A Survey

Arxiv

12+阅读 · 2022年3月8日

相关基金

基于三级供应链的质量与需求不对称下制造商激励契约设计研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

新型胺脱氢酶催化酮不对称还原胺化的分子机制探索

国家自然科学基金

0+阅读 · 2014年12月31日

顾客社区中的顾客契合行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型端粒酶TERT抑制剂：手性吡唑-香豆素-色酮新骨架的优化设计合成及构效关系

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

沙尘暴虚拟体验中心

国家自然科学基金

0+阅读 · 2010年12月31日

计算机模拟蒽环类抗癌药物与DNA相互作用的分子对接模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

海洋天然产物Eudistomin衍生物的设计、合成及抗乙肝病毒构效关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型手性N-Oxide金属化合物的合成与催化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员