带自由形式文字查询的手语视频检索器 (Sign Language Video Retrieval with Free-Form Textual Queries) - 专知论文

会员服务 ·

0

Learning · 训练数据 · 可辨认的 · Attention · Performer ·

2022 年 9 月 15 日

Sign Language Video Retrieval with Free-Form Textual Queries

翻译：带自由形式文字查询的手语视频检索器

Amanda Duarte,Samuel Albanie,Xavier Giró-i-Nieto,Gül Varol

from arxiv, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with free-form textual queries: given a written query (e.g., a sentence) and a large collection of sign language videos, the objective is to find the signing video in the collection that best matches the written query. We propose to tackle this task by learning cross-modal embeddings on the recently introduced large-scale How2Sign dataset of American Sign Language (ASL). We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labeled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. We validate the effectiveness of SPOT-ALIGN for learning a robust sign video embedding through improvements in both sign recognition and the proposed video retrieval task.

翻译：高效搜索手语视频收藏的系统被强调为手语技术的有用应用,然而,在文献中,搜索单个关键词之外的视频问题受到的注意有限。为了弥补这一差距,我们在此工作中引入了手语检索任务,并提供了免费文本查询:根据书面查询(例如一句)和大量手语视频汇编,目标是在收藏中找到最符合书面查询的签名视频。我们提议通过学习最近推出的美国手语大规模 How2Sign数据集(ASL)的跨模式嵌入来完成这项任务。我们发现,系统运行中的一个关键瓶颈是手语嵌入的质量,因为缺少贴标签的培训数据。因此,我们提出SPOT-ALIGN,即一个相互连接的迭接的签名定位和特征校准框架,以扩大现有培训数据的范围和规模。我们确认SPOT-ALIGN通过改进签名识别和拟议视频检索任务,学习强有力的手语视频嵌入的有效性。

0

相关内容

Learning

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

李超代数中若干问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

热层大气潮汐波和行星波的非相干散射雷达链研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Erbin介导细胞周期异常与肿瘤发生的关系

国家自然科学基金

0+阅读 · 2012年12月31日

基于Cosserat连续体平均场理论的颗粒材料多尺度计算均匀化

国家自然科学基金

0+阅读 · 2012年12月31日

循环载荷下单晶高温合金变形和裂纹形核扩展的位错机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

河口水体悬浮颗粒固有光学性质的时空变异特征及其机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Arxiv

1+阅读 · 2022年10月26日

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Arxiv

0+阅读 · 2022年10月26日

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Arxiv

0+阅读 · 2022年10月25日

Language-free Training for Zero-shot Video Grounding

Arxiv

0+阅读 · 2022年10月24日

On Cross-Domain Pre-Trained Language Models for Clinical Text Mining: How Do They Perform on Data-Constrained Fine-Tuning?

Arxiv

0+阅读 · 2022年10月23日

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Arxiv

0+阅读 · 2022年10月22日

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

Arxiv

0+阅读 · 2022年10月21日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Arxiv

0+阅读 · 2022年10月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Arxiv

1+阅读 · 2022年10月26日

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Arxiv

0+阅读 · 2022年10月26日

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Arxiv

0+阅读 · 2022年10月25日

Language-free Training for Zero-shot Video Grounding

Arxiv

0+阅读 · 2022年10月24日

On Cross-Domain Pre-Trained Language Models for Clinical Text Mining: How Do They Perform on Data-Constrained Fine-Tuning?

Arxiv

0+阅读 · 2022年10月23日

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Arxiv

0+阅读 · 2022年10月22日

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

Arxiv

0+阅读 · 2022年10月21日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Arxiv

0+阅读 · 2022年10月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

李超代数中若干问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

热层大气潮汐波和行星波的非相干散射雷达链研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Erbin介导细胞周期异常与肿瘤发生的关系

国家自然科学基金

0+阅读 · 2012年12月31日

基于Cosserat连续体平均场理论的颗粒材料多尺度计算均匀化

国家自然科学基金

0+阅读 · 2012年12月31日

循环载荷下单晶高温合金变形和裂纹形核扩展的位错机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

河口水体悬浮颗粒固有光学性质的时空变异特征及其机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员