一次性地对所有东西进行分割 (Segment Everything Everywhere All at Once) - 专知论文

会员服务 ·

0

分割 · 掩码 · 动态组合 · 语义空间 · 交互 ·

2023 年 4 月 18 日

Segment Everything Everywhere All at Once

翻译：一次性地对所有东西进行分割

Xueyan Zou,Jianwei Yang,Hao Zhang,Feng Li,Linjie Li,Jianfeng Gao,Yong Jae Lee

Despite the growing demand for interactive AI systems, there have been few comprehensive studies on human-AI interaction in visual understanding e.g. segmentation. Inspired by the development of prompt-based universal interfaces for LLMs, this paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. SEEM has four desiderata: i) Versatility: by introducing a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image; ii) Compositionality: by learning a joint visual-semantic space for visual and textual prompts to compose queries on the fly for inference as shown in Fig 1; iii)Interactivity: by incorporating learnable memory prompts to retain dialog history information via mask-guided cross-attention; and iv) Semantic-awareness: by using a text encoder to encode text queries and mask labels for open-vocabulary segmentation.

翻译：尽管对交互式AI系统的需求正在增长，但在可视化理解（例如分割）中，关于人工智能与人类交互的综合性研究却很少。受到为LLMs开发适用于所有类型的通用界面的启发，本文提出了SEEM，一种可提示、互动式的模型，用于一次性地在图像中对所有东西进行分割。 SEEM具有四个期望：i）通用性：通过引入用于不同类型提示的通用提示引擎，包括点、框、涂鸦、掩码、文本和另一个图像的指定区域；ii）组成性：通过学习联合视觉-语义空间，用于视觉和文本提示来动态组合查询，即Fig 1所示；iii）互动性：通过采用可学习的记忆提示来保留对话历史记录信息，通过基于掩码的交叉注意力实现；和iv）语义感知：通过使用文本编码器对文本查询和掩码标签进行编码以进行开放式词汇的分割。

0

相关内容

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio-先验意识论文最新版本】The Consciousness Prior，Yoshua Bengio

【Yoshua Bengio-先验意识论文最新版本】The Consciousness Prior，Yoshua Bengio

专知会员服务

19+阅读 · 2019年12月12日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

PaperWeekly

0+阅读 · 2022年11月13日

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

CD47和Band3介导的衰老红细胞吞噬机制的单分子定位成像研究

国家自然科学基金

0+阅读 · 2015年12月31日

拟人化hERG基因突变及敲除的糖尿病动物模型的建立

国家自然科学基金

0+阅读 · 2014年12月31日

听力损伤评价方法及计算模型

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HEVC的运动信息可分级的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多层次自组装的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

音频信号处理中基于模型的语音与音乐信号分离算法

国家自然科学基金

1+阅读 · 2009年12月31日

基于EMCCD的二维天文光子计数成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Arxiv

0+阅读 · 2023年6月2日

Segment Anything in High Quality

Arxiv

0+阅读 · 2023年6月2日

GRES: Generalized Referring Expression Segmentation

Arxiv

0+阅读 · 2023年6月1日

Second Sight: Using brain-optimized encoding models to align image distributions with human brain activity

Arxiv

0+阅读 · 2023年6月1日

Geo-Tiles for Semantic Segmentation of Earth Observation Imagery

Arxiv

0+阅读 · 2023年6月1日

Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

Arxiv

0+阅读 · 2023年6月1日

ContrastMask: Contrastive Learning to Segment Every Thing

Arxiv

15+阅读 · 2022年3月18日

Affective Image Content Analysis: Two Decades Review and New Perspectives

Arxiv

16+阅读 · 2021年6月30日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio-先验意识论文最新版本】The Consciousness Prior，Yoshua Bengio

【Yoshua Bengio-先验意识论文最新版本】The Consciousness Prior，Yoshua Bengio

专知会员服务

19+阅读 · 2019年12月12日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

PaperWeekly

0+阅读 · 2022年11月13日

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Arxiv

0+阅读 · 2023年6月2日

Segment Anything in High Quality

Arxiv

0+阅读 · 2023年6月2日

GRES: Generalized Referring Expression Segmentation

Arxiv

0+阅读 · 2023年6月1日

Second Sight: Using brain-optimized encoding models to align image distributions with human brain activity

Arxiv

0+阅读 · 2023年6月1日

Geo-Tiles for Semantic Segmentation of Earth Observation Imagery

Arxiv

0+阅读 · 2023年6月1日

Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

Arxiv

0+阅读 · 2023年6月1日

ContrastMask: Contrastive Learning to Segment Every Thing

Arxiv

15+阅读 · 2022年3月18日

Affective Image Content Analysis: Two Decades Review and New Perspectives

Arxiv

16+阅读 · 2021年6月30日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

CD47和Band3介导的衰老红细胞吞噬机制的单分子定位成像研究

国家自然科学基金

0+阅读 · 2015年12月31日

拟人化hERG基因突变及敲除的糖尿病动物模型的建立

国家自然科学基金

0+阅读 · 2014年12月31日

听力损伤评价方法及计算模型

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HEVC的运动信息可分级的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多层次自组装的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

音频信号处理中基于模型的语音与音乐信号分离算法

国家自然科学基金

1+阅读 · 2009年12月31日

基于EMCCD的二维天文光子计数成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员