Omnivore: 多种视觉模式的单一模式 (Omnivore: A Single Model for Many Visual Modalities) - 专知论文

会员服务 ·

0

Omnivore · MoDELS · 模态 · Performer · Better ·

2022 年 1 月 20 日

Omnivore: A Single Model for Many Visual Modalities

翻译：Omnivore: 多种视觉模式的单一模式

Rohit Girdhar,Mannat Singh,Nikhila Ravi,Laurens van der Maaten,Armand Joulin,Ishan Misra

Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is trained jointly on classification tasks from different modalities. Omnivore is simple to train, uses off-the-shelf standard datasets, and performs at-par or better than modality-specific models of the same size. A single Omnivore model obtains 86.0% on ImageNet, 84.1% on Kinetics, and 67.1% on SUN RGB-D. After finetuning, our models outperform prior work on a variety of vision tasks and generalize across modalities. Omnivore's shared visual representation naturally enables cross-modal recognition without access to correspondences between modalities. We hope our results motivate researchers to model visual modalities together.

翻译：先前的工作是单独研究不同的视觉模式, 并开发了不同的图像、视频和 3D 数据识别结构。相反, 在本文中, 我们提出一个单一模型, 这个模型在图像、视频和单一视图 3D 数据分类方面使用完全相同的模型参数。我们的“ omnivore ” 模型利用基于变压器的架构的灵活性, 并共同接受不同模式的分类任务培训。 Omnivore 简单易培训, 使用现成的标准数据集, 并进行类似大小的、与特定模式不同的模型的平行或更好的演化。一个单一的 Omnivore 模型在图像网络上获得86.0%, 动因学获得84.1%, SUN RGB- D 获得67.1%。经过微调, 我们的模型超越了先前关于各种视觉任务的工作, 并且对不同模式进行总体化。 Omnivore 共享的视觉表现自然使得跨模式的识别, 无法获取各种模式之间的对应。我们希望我们的成果能激励研究人员一起模拟视觉模式。

1

相关内容

Omnivore

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

超长细口径管道内表面三维视觉测量技术及传感器

国家自然科学基金

0+阅读 · 2014年12月31日

基于多波长共振的表面等离激元传感的研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

内蒙古典型草原优势植物种植株生长行为应对降水变化的响应

国家自然科学基金

0+阅读 · 2012年12月31日

结构物入水冲击动力学问题的DSPH计算方法和试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态几何分析与三维重建

国家自然科学基金

2+阅读 · 2012年12月31日

复合测头的表面特征尺寸检测方法与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

芍药切花ACS和ETR1基因的克隆、时空表达及功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

细胞团显微成像与分析用高分辨率X-CT研究

国家自然科学基金

0+阅读 · 2011年12月31日

高效生防菌Bs-916的突变菌株M49表面活性素合成能力降低的内在分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nested Collaborative Learning for Long-Tailed Visual Recognition

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Dual-Key Multimodal Backdoors for Visual Question Answering

Arxiv

1+阅读 · 2022年4月18日

Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting

Arxiv

0+阅读 · 2022年4月16日

Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition

Arxiv

0+阅读 · 2022年4月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Nested Collaborative Learning for Long-Tailed Visual Recognition

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Dual-Key Multimodal Backdoors for Visual Question Answering

Arxiv

1+阅读 · 2022年4月18日

Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting

Arxiv

0+阅读 · 2022年4月16日

Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition

Arxiv

0+阅读 · 2022年4月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

相关基金

超长细口径管道内表面三维视觉测量技术及传感器

国家自然科学基金

0+阅读 · 2014年12月31日

基于多波长共振的表面等离激元传感的研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

内蒙古典型草原优势植物种植株生长行为应对降水变化的响应

国家自然科学基金

0+阅读 · 2012年12月31日

结构物入水冲击动力学问题的DSPH计算方法和试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态几何分析与三维重建

国家自然科学基金

2+阅读 · 2012年12月31日

复合测头的表面特征尺寸检测方法与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

芍药切花ACS和ETR1基因的克隆、时空表达及功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

细胞团显微成像与分析用高分辨率X-CT研究

国家自然科学基金

0+阅读 · 2011年12月31日

高效生防菌Bs-916的突变菌株M49表面活性素合成能力降低的内在分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员