【CVPR2024教程】从多模态大语言模型到人类水平的AI：模态、指令、推理、效率及其他，200多页ppt

欢迎参加CVPR 2024的多模态大语言模型（MLLM）教程系列！人工智能（AI）涵盖了跨越多种模态的知识获取和现实世界的基础。作为一个多学科研究领域，多模态大语言模型（MLLM）最近在学术界和工业界引起了越来越多的关注，展示了通过MLLM实现人类水平AI的前所未有的趋势。这些大型模型通过整合和建模多种信息模态，包括语言、视觉、听觉和感官数据，提供了一个理解、推理和规划的有效工具。本教程旨在对MLLM领域的前沿研究进行全面回顾，重点关注四个关键领域：MLLM架构设计、指令学习与幻觉、多模态推理以及MLLM中的高效学习。我们将探讨技术进步，综合关键挑战，并讨论未来研究的潜在方向。

参考文献:

OpenAI, 2023, Introducing ChatGPT

OpenAI, 2023, GPT-4 Technical Report

Alayrac, et al., 2022, Flamingo: a Visual Language Model for Few-Shot Learning

Li, et al., 2023, BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Zhu, et al., 2023, MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Wu, et al., 2023, Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Shen, et al., 2023, HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Tang, et al., 2023, Any-to-Any Generation via Composable Diffusion

Girdhar, et al., 2023, ImageBind: One Embedding Space To Bind Them All

Wu, et al., 2023, NExT-GPT: Any-to-Any Multimodal LLM

Moon, et al., 2023, AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Hu, et al., 2023, Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Bai, et al., 2023, Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Wang, et al., 2023, CogVLM: Visual Expert for Pretrained Language Models

Peng, et al., 2023, Kosmos-2: Grounding Multimodal Large Language Models to the World

Dong, et al., 2023, InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Zhu, et al., 2023, LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Ge, et al., 2023, Planting a SEED of Vision in Large Language Model

Zhan, et al., 2024, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Kondratyuk, et al., 2023, VideoPoet: A Large Language Model for Zero-Shot Video Generation

Zhang, et al., 2023, SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Zeghidour, et al., 2021, SoundStream: An End-to-End Neural Audio Codec

Liu, et al., 2023, Improved Baselines with Visual Instruction Tuning

Wu, et al., 2023, Visual-ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Wang, et al., 2023, ModaVerse: Efficiently Transforming Modalities with LLMs

Fei, et al., 2024, VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Lu, et al., 2023, Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Bai, et al., 2023, LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models

Huang, et al., 2023, Language Is Not All You Need: Aligning Perception with Language Models

Li, et al., 2023, VideoChat: Chat-Centric Video Understanding

Maaz, et al., 2023, Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Zhang, et al., 2023, Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Lin, et al., 2023, Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Qian, et al., 2024, Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）

后台回复或发消息“R169” 就可以获取《【CVPR2024教程】推理的鲁棒性：走向可解释性、不确定性和可干预性，169页ppt****》专知下载链接

点击“阅读原文”，了解使用专知****，查看获取100000**+AI主题知识资料**

成为VIP会员查看完整内容

相关内容

CVPR 2024

关注 8

如何构建媲美ChatGPT的开源大模型？南洋理工等最新《开源大型语言模型》综述，最佳开源LLM配方

专知会员服务

75+阅读 · 2023年11月29日

英伟达斯坦福CVPR2023等最新《去噪扩散模型：生成学习的大爆炸》教程，附300多页ppt

专知会员服务

54+阅读 · 2023年6月27日

CVPR 2023开会了！UIUC等最新《知识驱动的视觉语言编码》教程，附ppt

专知会员服务

34+阅读 · 2023年6月24日

ChatGPT大模型work三个要点是什么？谷歌JasonWei最新《大型语言模型的缩放、涌现和推理》报告，附Slides与视频

专知会员服务

100+阅读 · 2023年4月14日

开课了！CMU《多模态机器学习》2023课程，附课件

专知会员服务

74+阅读 · 2023年2月12日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知会员服务

141+阅读 · 2022年10月19日

开课了！CMU《多模态机器学习》2022课程，附课件与视频

专知会员服务

155+阅读 · 2022年2月1日

要上手RL？406页《深度强化学习》2022新书，Leiden大学Aske Plaat教授主讲

专知会员服务

132+阅读 · 2022年1月11日

787页ppt！《深度学习：Deep Learning》硬核课程PPT，比利时列日大学Gilles Louppe讲授

专知会员服务

68+阅读 · 2021年5月9日

【实用书】Python文本分析第二版，688页pdf带你入门自然语言处理

专知会员服务

162+阅读 · 2020年5月15日

【干货书】实用时间序列分析：使用Python掌握时间序列数据处理、可视化和建模,238页pdf

专知

29+阅读 · 2022年5月26日

【干货书】MLOps是什么？MLOps实战：操作机器学习模型，461页pdf

专知

14+阅读 · 2022年2月16日

不可错过！图宾根大学《深度学习》课程，12讲述神经网络、GNN、GAN、序列模型等主题，附Slides与151页pdf笔记

专知

18+阅读 · 2021年5月8日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知

16+阅读 · 2020年10月13日

【DeepMind深度学习课程】神经网络基础，104页ppt，Neural Networks Foundations

专知

13+阅读 · 2020年6月24日

【实用书】Python文本分析第二版，688页pdf带你入门自然语言处理

专知

25+阅读 · 2020年5月15日

【UMD开放经典书】机器学习课程简明书，19章227页pdf，带你学习ML

专知

46+阅读 · 2019年12月9日

开源新书《时间序列分析，数据/方法/应用》，6章110页pdf带你了解最新进展，附下载

专知

91+阅读 · 2019年11月20日

精品教材-《Grokking深度学习》分享

深度学习与NLP

13+阅读 · 2019年1月19日

【下载】最新TensorFlow深度学习教程指引《Learning TensorFlow，构建深度学习系统指引》

专知

28+阅读 · 2017年12月6日

量子算法理论专题讲习班

国家自然科学基金

15+阅读 · 2017年12月31日

拓扑非线性分析专题讲习班

国家自然科学基金

15+阅读 · 2016年12月31日

基于复杂图知识表示的终身强化学习研究

国家自然科学基金

33+阅读 · 2015年12月31日

基于高斯过程模型的多示例多标记学习算法研究

国家自然科学基金

13+阅读 · 2015年12月31日

面向大规模分布式一致性最优化问题的结构型一阶求解算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

上市公司文本信息分析研究：基于大数据的视角

国家自然科学基金

8+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

171+阅读 · 2023年4月20日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

216+阅读 · 2023年4月7日

A Survey of Large Language Models

Arxiv

477+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

147+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

61+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

78+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

173+阅读 · 2023年3月24日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Arxiv

84+阅读 · 2023年3月21日

Data-centric Artificial Intelligence: A Survey

Arxiv

24+阅读 · 2023年3月17日

VIP会员