LLaVA论文 - 专知

会员服务 ·

LLaVA

Rethinking Visual Information Processing in Multimodal LLMs

Arxiv

0+阅读 · 11月13日

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation

Arxiv

0+阅读 · 11月30日

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

Arxiv

0+阅读 · 12月14日

VLSBench: Unveiling Visual Leakage in Multimodal Safety

Arxiv

0+阅读 · 1月17日

LLaVAC: Fine-tuning LLaVA as a Multimodal Sentiment Classifier

Arxiv

0+阅读 · 2月5日

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Arxiv

0+阅读 · 2月1日

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Arxiv

0+阅读 · 2月13日

PolySmart @ TRECVid 2024 Video Captioning (VTT)

Arxiv

0+阅读 · 1月25日

A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features

Arxiv

0+阅读 · 1月17日

LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering

Arxiv

0+阅读 · 1月7日

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Arxiv

0+阅读 · 1月7日

Vision Language Models as Values Detectors

Arxiv

0+阅读 · 1月7日

Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering

Arxiv

0+阅读 · 1月11日

PolySmart @ TRECVid 2024 Video-To-Text

Arxiv

1+阅读 · 2024年12月23日

PolySmart @ TRECVid 2024 Video-To-Text

Arxiv

1+阅读 · 2024年12月20日

参考链接

微信扫码咨询专知VIP会员