通用紧凑表示学习的图像检索(Unicom) (Unicom: Universal and Compact Representation Learning for Image Retrieval) - 专知论文

会员服务 ·

0

图像检索 · 类别 · 随机选择 · 特征表示 · 预训练 ·

2023 年 4 月 12 日

Unicom: Universal and Compact Representation Learning for Image Retrieval

翻译：通用紧凑表示学习的图像检索(Unicom)

Xiang An,Jiankang Deng,Kaicheng Yang,Jaiwei Li,Ziyong Feng,Jia Guo,Jing Yang,Tongliang Liu

from arxiv, Accepted at ICLR2023

Modern image retrieval methods typically rely on fine-tuning pre-trained encoders to extract image-level descriptors. However, the most widely used models are pre-trained on ImageNet-1K with limited classes. The pre-trained feature representation is therefore not universal enough to generalize well to the diverse open-world classes. In this paper, we first cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model. Due to the confusion of label granularity, the automatically clustered dataset inevitably contains heavy inter-class conflict. To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss. To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes. The dual random partial selections are with respect to the class dimension and the feature dimension of the prototype matrix, making the classification conflict-robust and the feature embedding compact. Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks. The code and pre-trained models are released to facilitate future research https://github.com/deepglint/unicom.

翻译：现代图像检索方法通常依赖于微调预训练编码器以提取图像级描述符。然而，最广泛使用的模型是在ImageNet-1K上预训练的，仅包含有限的类别。预训练的特征表示因此不够通用，不能很好地推广到多样的开放世界类别。在本文中，我们首先基于CLIP模型提取的联合文本和视觉特征将大规模的LAION400M聚类成一百万个伪类。由于标签粒度的混淆，自动聚类的数据集不可避免地包含严重的类间冲突。为了减轻这种冲突，我们随机选择部分类间原型来构建基于边缘的softmax损失。为了进一步增强低维特征表示，我们在计算嵌入和类别原型之间的相似度时随机选择部分特征维度。双重随机局部选择是针对原型矩阵的类别维度和特征维度进行的，使得分类冲突鲁棒且特征嵌入紧凑。我们的方法在多个基准测试中显著优于最先进的无监督和监督图像检索方法。我们已经发布了代码和预训练模型，以促进未来的研究：https://github.com/deepglint/unicom。

0

相关内容

图像检索

从20世纪70年代开始，有关图像检索的研究就已开始，当时主要是基于文本的图像检索技术（Text-based Image Retrieval，简称TBIR），利用文本描述的方式描述图像的特征，如绘画作品的作者、年代、流派、尺寸等。到90年代以后，出现了对图像的内容语义，如图像的颜色、纹理、布局等进行分析和检索的图像检索技术，即基于内容的图像检索（Content-based Image Retrieval，简称CBIR）技术。CBIR属于基于内容检索（Content-based Retrieval，简称CBR）的一种，CBR中还包括对动态视频、音频等其它形式多媒体信息的检索技术。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【CVPR2022】基于鲁棒区域特征生成的零样本目标检测

【CVPR2022】基于鲁棒区域特征生成的零样本目标检测

专知会员服务

11+阅读 · 2022年3月22日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

OpenCV特征提取与图像检索实现（附代码）

OpenCV特征提取与图像检索实现（附代码）

AI100

14+阅读 · 2018年3月3日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

基于毛孔尺度面部特征的高效人脸识别研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于非线性流形学习的极化SAR特征提取与匹配技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

海量社群图像语义理解关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于局部不变特征和混合多示例学习的图像检索研究

国家自然科学基金

1+阅读 · 2013年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行子空间学习方法及其大规模图像识别应用研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于变分法与纹理分解的SAR图像分割与目标检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于语言模型的通用实体检索建模及框架实现研究

国家自然科学基金

7+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

TransHP: Image Classification with Hierarchical Prompting

Arxiv

0+阅读 · 2023年5月30日

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Arxiv

0+阅读 · 2023年5月26日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】基于鲁棒区域特征生成的零样本目标检测

【CVPR2022】基于鲁棒区域特征生成的零样本目标检测

专知会员服务

11+阅读 · 2022年3月22日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

OpenCV特征提取与图像检索实现（附代码）

OpenCV特征提取与图像检索实现（附代码）

AI100

14+阅读 · 2018年3月3日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

TransHP: Image Classification with Hierarchical Prompting

Arxiv

0+阅读 · 2023年5月30日

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Arxiv

0+阅读 · 2023年5月26日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

相关基金

基于毛孔尺度面部特征的高效人脸识别研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于非线性流形学习的极化SAR特征提取与匹配技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

海量社群图像语义理解关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于局部不变特征和混合多示例学习的图像检索研究

国家自然科学基金

1+阅读 · 2013年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行子空间学习方法及其大规模图像识别应用研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于变分法与纹理分解的SAR图像分割与目标检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于语言模型的通用实体检索建模及框架实现研究

国家自然科学基金

7+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员