搜索具备机体智能的人工视觉皮层的现状是什么？ (Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?) - 专知论文

会员服务 ·

0

人工视觉 · 多样性 · 实证研究 · 搜索 · 预训练 ·

2023 年 3 月 31 日

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

翻译：搜索具备机体智能的人工视觉皮层的现状是什么？

Arjun Majumdar,Karmesh Yadav,Sergio Arnaud,Yecheng Jason Ma,Claire Chen,Sneha Silwal,Aryan Jain,Vincent-Pierre Berges,Pieter Abbeel,Jitendra Malik,Dhruv Batra,Yixin Lin,Oleksandr Maksymets,Aravind Rajeswaran,Franziska Meier

from arxiv, Project website: https://eai-vc.github.io

We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data scale and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Finally, we show that task or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on our website for the benefit of the research community.

翻译：我们提出了最大且最全面的预训练视觉表示（PVR）或视觉“基础模型”对于机体人工智能的实证研究。首先，我们创建了 CortexBench，包含横跨运动、导航、巧妙和移动操作的 17 种不同任务。接下来，我们系统地评估现有的 PVR 并发现没有一种万能的优势模型。为了研究预训练数据规模和多样性的影响，我们结合了来自 7 个不同源（超过 560 万图像）的 egocentric 视频和 ImageNet 数据集，使用遮掩自编码（MAE）在这些数据的切片上训练不同大小的 vision transformers。与先前的工作推断相反，我们发现扩大数据集规模和多样性并不能普遍改善表现（但平均而言确实如此）。我们最大的模型名为 VC-1，在平均性能上优于所有以前的 PVR，但也没有普遍的支配力。最后，我们展示了 VC-1 的任务或领域特定适应会带来大幅度的收益，VC-1（适应）在 CortexBench 的所有基准测试中都实现了竞争性或优于已知的最佳结果。这些模型需要超过 10,000 个 GPU 小时才能训练，并可以在我们的网站上找到，以造福研究社区。

0

相关内容

人工视觉

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知会员服务

43+阅读 · 2021年10月21日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

借助 Compose for Wear OS，Todoist 安装增长率提高了 50%

借助 Compose for Wear OS，Todoist 安装增长率提高了 50%

谷歌开发者

0+阅读 · 2022年11月18日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Compose for Wear OS | 助力开发者提升可穿戴设备用户体验

Compose for Wear OS | 助力开发者提升可穿戴设备用户体验

谷歌开发者

0+阅读 · 2022年10月31日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

逐级靶向高分子纳米药物载体在抗癌药物传输上的应用

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

金纳米粒调控的新型细胞内递药系统用于人参皂苷和紫杉醇联合抗肿瘤研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

基于多模态磁共振成像2型糖尿病脑病嗅觉与认知障碍的机制探讨与干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型和厚朴酚高分子-PEG键合物的制备及其体内外抗肿瘤研究

国家自然科学基金

0+阅读 · 2012年12月31日

可聚合两亲性大分子和疏水单体原位共聚接枝碳纳米材料及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α23545;结肠癌细胞MDR1基因启动子的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Arxiv

21+阅读 · 2023年5月22日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月21日

Easy-to-Hard Learning for Information Extraction

Arxiv

0+阅读 · 2023年5月19日

EC-NAS: Energy Consumption Aware Tabular Benchmarks for Neural Architecture Search

Arxiv

0+阅读 · 2023年5月18日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Collective Intelligence for Deep Learning: A Survey of Recent Developments

Arxiv

22+阅读 · 2021年12月22日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知会员服务

43+阅读 · 2021年10月21日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

借助 Compose for Wear OS，Todoist 安装增长率提高了 50%

借助 Compose for Wear OS，Todoist 安装增长率提高了 50%

谷歌开发者

0+阅读 · 2022年11月18日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Compose for Wear OS | 助力开发者提升可穿戴设备用户体验

Compose for Wear OS | 助力开发者提升可穿戴设备用户体验

谷歌开发者

0+阅读 · 2022年10月31日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Arxiv

21+阅读 · 2023年5月22日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月21日

Easy-to-Hard Learning for Information Extraction

Arxiv

0+阅读 · 2023年5月19日

EC-NAS: Energy Consumption Aware Tabular Benchmarks for Neural Architecture Search

Arxiv

0+阅读 · 2023年5月18日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Collective Intelligence for Deep Learning: A Survey of Recent Developments

Arxiv

22+阅读 · 2021年12月22日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

相关基金

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

逐级靶向高分子纳米药物载体在抗癌药物传输上的应用

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

金纳米粒调控的新型细胞内递药系统用于人参皂苷和紫杉醇联合抗肿瘤研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

基于多模态磁共振成像2型糖尿病脑病嗅觉与认知障碍的机制探讨与干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型和厚朴酚高分子-PEG键合物的制备及其体内外抗肿瘤研究

国家自然科学基金

0+阅读 · 2012年12月31日

可聚合两亲性大分子和疏水单体原位共聚接枝碳纳米材料及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α23545;结肠癌细胞MDR1基因启动子的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员