CompoDiff: 具有潜在扩散的通用组合图像检索 (CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion) - 专知论文

会员服务 ·

0

图像检索 · 潜在 · MoDELS · state-of-the-art · HTTPS ·

2023 年 3 月 21 日

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

翻译：CompoDiff: 具有潜在扩散的通用组合图像检索

Geonmo Gu,Sanghyuk Chun,Wonjae Kim,HeeJae Jun,Yoohoon Kang,Sangdoo Yun

from arxiv, First two authors contributed equally; 23 pages, 4.8MB

This paper proposes a novel diffusion-based model, CompoDiff, for solving Composed Image Retrieval (CIR) with latent diffusion and presents a newly created dataset of 18 million reference images, conditions, and corresponding target image triplets to train the model. CompoDiff not only achieves a new zero-shot state-of-the-art on a CIR benchmark such as FashionIQ but also enables a more versatile CIR by accepting various conditions, such as negative text and image mask conditions, which are unavailable with existing CIR methods. In addition, the CompoDiff features are on the intact CLIP embedding space so that they can be directly used for all existing models exploiting the CLIP space. The code and dataset used for the training, and the pre-trained weights are available at https://github.com/navervision/CompoDiff

翻译：本文提出了一种新型的基于扩散的模型CompoDiff，用于解决具有潜在扩散的组合图像检索（CIR）问题，并介绍了一个新创建的数据集，包括1800万参考图像、条件和相应的目标图像三元组，以用于训练该模型。CompoDiff不仅在诸如FashionIQ等CIR基准测试中实现了新的零样本最新技术，而且还通过接受各种条件（例如负面文本和图像遮罩条件），使CIR更加通用。此外，CompoDiff的特征是在完整的CLIP嵌入空间上，因此它们可以直接用于利用CLIP空间的所有现有模型。代码和用于训练的数据集，以及预先训练权重可在https://github.com/navervision/CompoDiff中找到。

0

相关内容

图像检索

从20世纪70年代开始，有关图像检索的研究就已开始，当时主要是基于文本的图像检索技术（Text-based Image Retrieval，简称TBIR），利用文本描述的方式描述图像的特征，如绘画作品的作者、年代、流派、尺寸等。到90年代以后，出现了对图像的内容语义，如图像的颜色、纹理、布局等进行分析和检索的图像检索技术，即基于内容的图像检索（Content-based Image Retrieval，简称CBIR）技术。CBIR属于基于内容检索（Content-based Retrieval，简称CBR）的一种，CBR中还包括对动态视频、音频等其它形式多媒体信息的检索技术。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

专知会员服务

54+阅读 · 2023年4月1日

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

【NeurIPS2021】去栅格化的矢量图识别

【NeurIPS2021】去栅格化的矢量图识别

专知会员服务

16+阅读 · 2021年11月18日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020-莫斯科Yandex】双曲图像嵌入，Hyperbolic Image Embeddings

【CVPR2020-莫斯科Yandex】双曲图像嵌入，Hyperbolic Image Embeddings

专知会员服务

40+阅读 · 2020年4月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

基于张量分解的高光谱图像随机噪声去噪方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

层状类MoS2低维热电材料的多尺度计算研究

国家自然科学基金

0+阅读 · 2013年12月31日

空谱联合相关性驱动的高光谱图像概率图修补模型与算法

国家自然科学基金

0+阅读 · 2013年12月31日

层状碳纤维复合材料典型缺陷的电磁反射机理及微波无损检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于上下文感知的部件组装三维建模

国家自然科学基金

0+阅读 · 2012年12月31日

具有拓扑鲁棒性的三维人脸识别算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

地面激光扫描点云和光学图像的球面二次成像模型及多球面组合自动配准研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

近红外谱区建立南疆红枣快速无损分级模型校正库的研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂组合曲面无干涉五轴加工刀位生成理论及应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Evaluating Open-Domain Question Answering in the Era of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Arxiv

0+阅读 · 2023年5月10日

Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

Arxiv

0+阅读 · 2023年5月10日

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Arxiv

0+阅读 · 2023年5月9日

Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query

Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query

Arxiv

1+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月9日

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Arxiv

0+阅读 · 2023年5月9日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

专知会员服务

54+阅读 · 2023年4月1日

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

【NeurIPS2021】去栅格化的矢量图识别

【NeurIPS2021】去栅格化的矢量图识别

专知会员服务

16+阅读 · 2021年11月18日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020-莫斯科Yandex】双曲图像嵌入，Hyperbolic Image Embeddings

【CVPR2020-莫斯科Yandex】双曲图像嵌入，Hyperbolic Image Embeddings

专知会员服务

40+阅读 · 2020年4月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Evaluating Open-Domain Question Answering in the Era of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Arxiv

0+阅读 · 2023年5月10日

Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

Arxiv

0+阅读 · 2023年5月10日

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Arxiv

0+阅读 · 2023年5月9日

Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query

Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query

Arxiv

1+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月9日

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Arxiv

0+阅读 · 2023年5月9日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

相关基金

基于张量分解的高光谱图像随机噪声去噪方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

层状类MoS2低维热电材料的多尺度计算研究

国家自然科学基金

0+阅读 · 2013年12月31日

空谱联合相关性驱动的高光谱图像概率图修补模型与算法

国家自然科学基金

0+阅读 · 2013年12月31日

层状碳纤维复合材料典型缺陷的电磁反射机理及微波无损检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于上下文感知的部件组装三维建模

国家自然科学基金

0+阅读 · 2012年12月31日

具有拓扑鲁棒性的三维人脸识别算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

地面激光扫描点云和光学图像的球面二次成像模型及多球面组合自动配准研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

近红外谱区建立南疆红枣快速无损分级模型校正库的研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂组合曲面无干涉五轴加工刀位生成理论及应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员