DetGPT: Detect What You Need via Reasoning - 专知论文

会员服务 ·

0

INTERACT · 目标检测 · MoDELS · 可辨认的 · Automator ·

2023 年 5 月 24 日

DetGPT: Detect What You Need via Reasoning

翻译：暂无翻译

Renjie Pi,Jiahui Gao,Shizhe Diao,Rui Pan,Hanze Dong,Jipeng Zhang,Lewei Yao,Jianhua Han,Hang Xu,Lingpeng Kong,Tong Zhang

In recent years, the field of computer vision has seen significant advancements thanks to the development of large language models (LLMs). These models have enabled more effective and sophisticated interactions between humans and machines, paving the way for novel techniques that blur the lines between human and machine intelligence. In this paper, we introduce a new paradigm for object detection that we call reasoning-based object detection. Unlike conventional object detection methods that rely on specific object names, our approach enables users to interact with the system using natural language instructions, allowing for a higher level of interactivity. Our proposed method, called DetGPT, leverages state-of-the-art multi-modal models and open-vocabulary object detectors to perform reasoning within the context of the user's instructions and the visual scene. This enables DetGPT to automatically locate the object of interest based on the user's expressed desires, even if the object is not explicitly mentioned. For instance, if a user expresses a desire for a cold beverage, DetGPT can analyze the image, identify a fridge, and use its knowledge of typical fridge contents to locate the beverage. This flexibility makes our system applicable across a wide range of fields, from robotics and automation to autonomous driving. Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines. We hope that our proposed paradigm and approach will provide inspiration to the community and open the door to more interative and versatile object detection systems. Our project page is launched at detgpt.github.io.

翻译：暂无翻译

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

金柴抗病毒胶囊对流感病毒感染激活的GIR—I信号通路的影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

两种杜鹃花科有毒植物中新颖结构二萜类化合物的发现及其生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

健脾解毒方对COX-2激活JNK信号通路介导大肠癌多药耐药的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

西沙群岛四种海绵新颖结构抗肿瘤活性成分的发现研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-β/Smads信号通路在干细胞移植治疗化疗损伤性卵巢早衰中的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

PTTG1基因在大肠癌中的表达及其临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

锂离子电池三维中空锡合金/高分子膜核壳纳米线阵列电极的稳定化研究

国家自然科学基金

0+阅读 · 2011年12月31日

MFAP-1调控信使RNA剪接过程的分子遗传研究

国家自然科学基金

0+阅读 · 2009年12月31日

TGF-beta1调控成釉细胞MMP-20基因转录活性的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

稀土微合金化对Cu-Al共析合金粉末凝固成形作用机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Arxiv

0+阅读 · 2023年7月11日

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Arxiv

0+阅读 · 2023年7月11日

Automatically detecting activities of daily living from in-home sensors as indicators of routine behaviour in an older population

Arxiv

0+阅读 · 2023年7月10日

Large Language Models for Supply Chain Optimization

Arxiv

0+阅读 · 2023年7月8日

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning

Arxiv

0+阅读 · 2023年7月7日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Arxiv

0+阅读 · 2023年7月11日

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Arxiv

0+阅读 · 2023年7月11日

Automatically detecting activities of daily living from in-home sensors as indicators of routine behaviour in an older population

Arxiv

0+阅读 · 2023年7月10日

Large Language Models for Supply Chain Optimization

Arxiv

0+阅读 · 2023年7月8日

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning

Arxiv

0+阅读 · 2023年7月7日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

相关基金

金柴抗病毒胶囊对流感病毒感染激活的GIR—I信号通路的影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

两种杜鹃花科有毒植物中新颖结构二萜类化合物的发现及其生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

健脾解毒方对COX-2激活JNK信号通路介导大肠癌多药耐药的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

西沙群岛四种海绵新颖结构抗肿瘤活性成分的发现研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-β/Smads信号通路在干细胞移植治疗化疗损伤性卵巢早衰中的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

PTTG1基因在大肠癌中的表达及其临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

锂离子电池三维中空锡合金/高分子膜核壳纳米线阵列电极的稳定化研究

国家自然科学基金

0+阅读 · 2011年12月31日

MFAP-1调控信使RNA剪接过程的分子遗传研究

国家自然科学基金

0+阅读 · 2009年12月31日

TGF-beta1调控成釉细胞MMP-20基因转录活性的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

稀土微合金化对Cu-Al共析合金粉末凝固成形作用机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员