BLIP-2论文 - 专知

会员服务 ·

BLIP-2

Fine-Tuning Vision-Language Models for Visual Navigation Assistance

Arxiv

0+阅读 · 9月9日

ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval

Arxiv

0+阅读 · 3月27日

PixLore: A Dataset-driven Approach to Rich Image Captioning

Arxiv

0+阅读 · 2024年10月23日

Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs

Arxiv

0+阅读 · 2024年8月8日

Hyperbolic Learning with Multimodal Large Language Models

Arxiv

0+阅读 · 2024年8月9日

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Arxiv

0+阅读 · 2024年6月13日

Balancing Performance and Efficiency in Zero-shot Robotic Navigation

Arxiv

0+阅读 · 2024年6月5日

Naming, Describing, and Quantifying Visual Objects in Humans and LLMs

Arxiv

0+阅读 · 2024年6月4日

MyVLM: Personalizing VLMs for User-Specific Queries

Arxiv

0+阅读 · 2024年3月21日

ChatGPT as a mapping assistant: A novel method to enrich maps with generative AI and content derived from street-level photographs

Arxiv

0+阅读 · 2024年3月15日

Naming, Describing, and Quantifying Visual Objects in Humans and LLMs

Arxiv

0+阅读 · 2024年3月13日

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Arxiv

0+阅读 · 2024年3月12日

Naming, Describing, and Quantifying Visual Objects in Humans and LLMs

Arxiv

0+阅读 · 2024年3月11日

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

Arxiv

0+阅读 · 2024年2月26日

Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction

Arxiv

0+阅读 · 2024年2月21日

参考链接

微信扫码咨询专知VIP会员