时装时装时装的遮面视觉语言变形器 (Masked Vision-Language Transformer in Fashion) - 专知论文

会员服务 ·

0

变换 · 掩码 · MoDELS · Extensibility · Obvious ·

2022 年 10 月 27 日

Masked Vision-Language Transformer in Fashion

翻译：时装时装时装的遮面视觉语言变形器

Ge-Peng Ji,Mingcheng Zhuge,Dehong Gao,Deng-Ping Fan,Christos Sakaridis,Luc Van Gool

from arxiv, Accepted by Machine Intelligence Research (2023)

We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize vision transformer architecture for replacing the BERT in the pre-training model, making MVLT the first end-to-end framework for the fashion domain. Besides, we designed masked image reconstruction (MIR) for a fine-grained understanding of fashion. MVLT is an extensible and convenient architecture that admits raw multi-modal inputs without extra pre-processing models (e.g., ResNet), implicitly modeling the vision-language alignments. More importantly, MVLT can easily generalize to various matching and generative tasks. Experimental results show obvious improvements in retrieval (rank@5: 17%) and recognition (accuracy: 3%) tasks over the Fashion-Gen 2018 winner Kaleido-BERT. Code is made available at https://github.com/GewelsJI/MVLT.

翻译：我们为时装专用多式代表制展示了一个隐蔽的视觉变压器(MVLT),从技术上讲,我们只是利用视觉变压器结构在培训前模式中取代BERT,使MVLT成为时装领域的第一个端到端框架。此外,我们设计了蒙面图像重建(MIR),以细微理解时装。MVLT是一个可扩展和方便的建筑,允许原始的多式投入,而没有额外的预处理模型(例如ResNet),隐含地模拟了视觉语言调整。更重要的是,MVLT可以很容易地概括各种匹配和基因化任务。实验结果显示,对Fashason-Gen 2018赢家Kaleido-BERT的检索(rank@5:17%)和识别(准确度:3%)任务有明显的改进。代码可在https://github.com/GewelsJI/MVLT上查阅。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂体系高灵敏拉曼定量分析中的多尺度建模方法

国家自然科学基金

0+阅读 · 2013年12月31日

电纺PVDF/TPU聚合物电解质膜的形态控制与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

时间分辨光谱研究Nitrenium离子与DNA形成致癌加合物的反应机理

国家自然科学基金

0+阅读 · 2012年12月31日

贵金属/氧化物复合纳米阵列作为表面增强拉曼基底检测食品污染物

国家自然科学基金

0+阅读 · 2011年12月31日

miR-130b在肝细胞癌发病中的作用及其表达调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

石墨烯材料的储锂机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于量子点标记物的电化学发光免疫分析新体系和新方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于量子点标记荧光生物探针的星形胶质细胞吞噬β28096;粉样肽的动态可视化新方法

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

MAViL: Masked Audio-Video Learners

Arxiv

0+阅读 · 2022年12月15日

Rethinking Vision Transformers for MobileNet Size and Speed

Arxiv

0+阅读 · 2022年12月15日

FlexiViT: One Model for All Patch Sizes

Arxiv

0+阅读 · 2022年12月15日

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

Arxiv

0+阅读 · 2022年12月15日

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Arxiv

0+阅读 · 2022年12月13日

What do Vision Transformers Learn? A Visual Exploration

Arxiv

0+阅读 · 2022年12月13日

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

Arxiv

0+阅读 · 2022年12月13日

Audiovisual Masked Autoencoders

Arxiv

0+阅读 · 2022年12月9日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

MAViL: Masked Audio-Video Learners

Arxiv

0+阅读 · 2022年12月15日

Rethinking Vision Transformers for MobileNet Size and Speed

Arxiv

0+阅读 · 2022年12月15日

FlexiViT: One Model for All Patch Sizes

Arxiv

0+阅读 · 2022年12月15日

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

Arxiv

0+阅读 · 2022年12月15日

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Arxiv

0+阅读 · 2022年12月13日

What do Vision Transformers Learn? A Visual Exploration

Arxiv

0+阅读 · 2022年12月13日

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

Arxiv

0+阅读 · 2022年12月13日

Audiovisual Masked Autoencoders

Arxiv

0+阅读 · 2022年12月9日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

相关基金

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂体系高灵敏拉曼定量分析中的多尺度建模方法

国家自然科学基金

0+阅读 · 2013年12月31日

电纺PVDF/TPU聚合物电解质膜的形态控制与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

时间分辨光谱研究Nitrenium离子与DNA形成致癌加合物的反应机理

国家自然科学基金

0+阅读 · 2012年12月31日

贵金属/氧化物复合纳米阵列作为表面增强拉曼基底检测食品污染物

国家自然科学基金

0+阅读 · 2011年12月31日

miR-130b在肝细胞癌发病中的作用及其表达调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

石墨烯材料的储锂机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于量子点标记物的电化学发光免疫分析新体系和新方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于量子点标记荧光生物探针的星形胶质细胞吞噬β28096;粉样肽的动态可视化新方法

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员