FLAVA论文 - 专知

会员服务 ·

FLAVA

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Arxiv

0+阅读 · 2024年1月16日

Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain

Arxiv

0+阅读 · 2023年12月18日

WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words

Arxiv

0+阅读 · 2023年12月7日

WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words

WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words

Arxiv

0+阅读 · 2023年12月5日

COLA: A Benchmark for Compositional Text-to-image Retrieval

Arxiv

0+阅读 · 2023年11月3日

Lifelong Audio-video Masked Autoencoder with Forget-robust Localized Alignments

Arxiv

0+阅读 · 2023年10月12日

COLA: A Benchmark for Compositional Text-to-image Retrieval

Arxiv

0+阅读 · 2023年9月8日

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

Arxiv

0+阅读 · 2023年5月5日

Controlling for Stereotypes in Multimodal Language Model Evaluation

Arxiv

0+阅读 · 2023年2月3日

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Arxiv

0+阅读 · 2022年9月12日

FLAVA: A Foundational Language And Vision Alignment Model

Arxiv

0+阅读 · 2022年3月29日

FLAVA: A Foundational Language And Vision Alignment Model

Arxiv

0+阅读 · 2022年2月6日

FLAVA: A Foundational Language And Vision Alignment Model

Arxiv

0+阅读 · 2021年12月8日

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-Based Point Clouds

Arxiv

0+阅读 · 2020年11月20日

参考链接

微信扫码咨询专知VIP会员