Zelda: Video Analytics using Vision-Language Models - 专知论文

会员服务 ·

0

模型评估 · MoDELS · 总回报 · 多样性 · 可辨认的 ·

2023 年 5 月 5 日

Zelda: Video Analytics using Vision-Language Models

翻译：暂无翻译

Francisco Romero,Caleb Winston,Johann Hauswald,Matei Zaharia,Christos Kozyrakis

Advances in ML have motivated the design of video analytics systems that allow for structured queries over video datasets. However, existing systems limit query expressivity, require users to specify an ML model per predicate, rely on complex optimizations that trade off accuracy for performance, and return large amounts of redundant and low-quality results. This paper focuses on the recently developed Vision-Language Models (VLMs) that allow users to query images using natural language like "cars during daytime at traffic intersections." Through an in-depth analysis, we show VLMs address three limitations of current video analytics systems: general expressivity, a single general purpose model to query many predicates, and are both simple and fast. However, VLMs still return large numbers of redundant and low-quality results, which can overwhelm and burden users. We present Zelda: a video analytics system that uses VLMs to return both relevant and semantically diverse results for top-K queries on large video datasets. Zelda prompts the VLM with the user's query in natural language and additional terms to improve accuracy and identify low-quality frames. Zelda improves result diversity by leveraging the rich semantic information encoded in VLM embeddings. We evaluate Zelda across five datasets and 19 queries and quantitatively show it achieves higher mean average precision (up to 1.15$\times$) and improves average pairwise similarity (up to 1.16$\times$) compared to using VLMs out-of-the-box. We also compare Zelda to a state-of-the-art video analytics engine and show that Zelda retrieves results 7.5$\times$ (up to 10.4$\times$) faster for the same accuracy and frame diversity.

翻译：暂无翻译

0

相关内容

模型评估

机器学习系统设计系统评估标准

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

线粒体TRAP1抑制肾小管上皮细胞凋亡在肾间质纤维化中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

激光诱导荧光光谱研究二氧化硫和硫化氢离子低电子态结构

国家自然科学基金

0+阅读 · 2012年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

氩等离子体弧处理金刚石自支撑膜强度提高的机制

国家自然科学基金

0+阅读 · 2012年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

面向文本分类的迁移学习和半监督学习方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

多金属空心球的电化学催化作用及金属间协同效应的研究

国家自然科学基金

0+阅读 · 2009年12月31日

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

Arxiv

0+阅读 · 2023年6月23日

SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

Arxiv

0+阅读 · 2023年6月22日

Limits for Learning with Language Models

Arxiv

0+阅读 · 2023年6月21日

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

Arxiv

0+阅读 · 2023年6月20日

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

Arxiv

0+阅读 · 2023年6月20日

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Arxiv

0+阅读 · 2023年6月19日

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Arxiv

1+阅读 · 2023年6月18日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《国防仿真模型的优化与分析》

《利用大语言模型（LLM）优化海军陆战队经验教训学习》2025年最新103页

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

超越第一人称视角（FPV）无人机：汲取俄乌战争的全部教训

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

相关论文

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

Arxiv

0+阅读 · 2023年6月23日

SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

Arxiv

0+阅读 · 2023年6月22日

Limits for Learning with Language Models

Arxiv

0+阅读 · 2023年6月21日

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

Arxiv

0+阅读 · 2023年6月20日

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

Arxiv

0+阅读 · 2023年6月20日

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Arxiv

0+阅读 · 2023年6月19日

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Arxiv

1+阅读 · 2023年6月18日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

线粒体TRAP1抑制肾小管上皮细胞凋亡在肾间质纤维化中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

激光诱导荧光光谱研究二氧化硫和硫化氢离子低电子态结构

国家自然科学基金

0+阅读 · 2012年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

氩等离子体弧处理金刚石自支撑膜强度提高的机制

国家自然科学基金

0+阅读 · 2012年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

面向文本分类的迁移学习和半监督学习方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

多金属空心球的电化学催化作用及金属间协同效应的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员