视觉理解论文 - 专知

会员服务 ·

视觉理解

VoQA: Visual-only Question Answering

Arxiv

0+阅读 · 11月30日

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Arxiv

0+阅读 · 12月11日

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Arxiv

0+阅读 · 11月25日

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Arxiv

0+阅读 · 11月6日

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Arxiv

0+阅读 · 12月15日

Segment Everything Everywhere All at Once

Arxiv

0+阅读 · 2023年5月1日

Token Turing Machines

Arxiv

0+阅读 · 2023年4月13日

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

Arxiv

0+阅读 · 2023年3月30日

参考链接

微信扫码咨询专知VIP会员