自然语言辅助手语识别 (Natural Language-Assisted Sign Language Recognition) - 专知论文

会员服务 ·

0

INFORMS · 平滑 · 相似度 · Vision · EASE ·

2023 年 3 月 21 日

Natural Language-Assisted Sign Language Recognition

翻译：自然语言辅助手语识别

Ronglai Zuo,Fangyun Wei,Brian Mak

from arxiv, Accepted by CVPR 2023. Codes are available at https://github.com/FangyunWei/SLRT

Sign languages are visual languages which convey information by signers' handshape, facial expression, body movement, and so forth. Due to the inherent restriction of combinations of these visual ingredients, there exist a significant number of visually indistinguishable signs (VISigns) in sign languages, which limits the recognition capacity of vision neural networks. To mitigate the problem, we propose the Natural Language-Assisted Sign Language Recognition (NLA-SLR) framework, which exploits semantic information contained in glosses (sign labels). First, for VISigns with similar semantic meanings, we propose language-aware label smoothing by generating soft labels for each training sign whose smoothing weights are computed from the normalized semantic similarities among the glosses to ease training. Second, for VISigns with distinct semantic meanings, we present an inter-modality mixup technique which blends vision and gloss features to further maximize the separability of different signs under the supervision of blended labels. Besides, we also introduce a novel backbone, video-keypoint network, which not only models both RGB videos and human body keypoints but also derives knowledge from sign videos of different temporal receptive fields. Empirically, our method achieves state-of-the-art performance on three widely-adopted benchmarks: MSASL, WLASL, and NMFs-CSL. Codes are available at https://github.com/FangyunWei/SLRT.

翻译：手语是一种通过手形、面部表情、身体动作等来传达信息的视觉语言。由于这些视觉元素的组合受到固有的限制，手语中存在许多视觉难以区分的符号（VISigns），这限制了视觉神经网络的识别能力。为了缓解这个问题，我们提出了自然语言辅助手语识别（NLA-SLR）框架，它利用了语义信息中包含的"词汇表"（符号标签）。首先，针对具有类似语义意义的VISigns，我们提出了语言感知标签平滑技术，为每个训练符号生成软标签，其平滑权重是由规范化后的符号之间语义相似性计算而来，以便于训练。其次，对于具有不同语义含义的VISigns，我们提出了一种交互式混合技术，将视觉和语义特征混合，以更大程度地利用混合标签指导下不同符号的可分离性。此外，我们还引入了一种新颖的骨干网络模型-视频关键点网络，该模型不仅能建模RGB视频和人体关键点，还能从不同时间响应领域的手语视频中获取知识。经验证，我们的方法在三个广泛采用的基准测试（MSASL、WLASL和NMFs-CSL）上取得了最先进的性能。源代码可在https://github.com/FangyunWei/SLRT找到。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

[NeurIPS 2020 oral] 基于因果干预的弱监督语义分割

专知会员服务

47+阅读 · 2020年10月5日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

近期必读的5篇顶会CVPR 2021【视频理解】相关论文和代码

近期必读的5篇顶会CVPR 2021【视频理解】相关论文和代码

专知

11+阅读 · 2021年3月31日

CVPR2019 | 03-20日更新11篇论文及代码汇总（含1篇oral，目标识别、行人检测、VQA、立体匹配等）

CVPR2019 | 03-20日更新11篇论文及代码汇总（含1篇oral，目标识别、行人检测、VQA、立体匹配等）

极市平台

50+阅读 · 2019年3月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

星形胶质细胞内源性PLD正性调控树突的发育

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务概率视觉语义模型的图像场景理解

国家自然科学基金

2+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

面向文本信息安全的类别语义模型分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于自然手势的三维交互技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

家蚕传染性软化病病毒翻译机制及侵染机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

实时双模态自动图像软标注与多关键词检索

国家自然科学基金

0+阅读 · 2009年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语文语转换中语义与表现力联合建模

国家自然科学基金

0+阅读 · 2008年12月31日

Exploring Softly Masked Language Modelling for Controllable Symbolic Music Generation

Arxiv

0+阅读 · 2023年5月11日

Korean Named Entity Recognition Based on Language-Specific Features

Arxiv

0+阅读 · 2023年5月10日

PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information

Arxiv

0+阅读 · 2023年5月10日

Optical Aberration Correction in Postprocessing using Imaging Simulation

Arxiv

0+阅读 · 2023年5月10日

On near-redundancy and identifiability of parametric hazard regression models under censoring

Arxiv

0+阅读 · 2023年5月9日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年5月9日

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Arxiv

0+阅读 · 2023年5月9日

Patch-DrosoNet: Classifying Image Partitions With Fly-Inspired Models For Lightweight Visual Place Recognition

Arxiv

0+阅读 · 2023年5月9日

DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition

Arxiv

0+阅读 · 2023年5月9日

ICASSP 2023 Deep Noise Suppression Challenge

Arxiv

1+阅读 · 2023年5月9日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

[NeurIPS 2020 oral] 基于因果干预的弱监督语义分割

专知会员服务

47+阅读 · 2020年10月5日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

近期必读的5篇顶会CVPR 2021【视频理解】相关论文和代码

近期必读的5篇顶会CVPR 2021【视频理解】相关论文和代码

专知

11+阅读 · 2021年3月31日

CVPR2019 | 03-20日更新11篇论文及代码汇总（含1篇oral，目标识别、行人检测、VQA、立体匹配等）

CVPR2019 | 03-20日更新11篇论文及代码汇总（含1篇oral，目标识别、行人检测、VQA、立体匹配等）

极市平台

50+阅读 · 2019年3月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Exploring Softly Masked Language Modelling for Controllable Symbolic Music Generation

Arxiv

0+阅读 · 2023年5月11日

Korean Named Entity Recognition Based on Language-Specific Features

Arxiv

0+阅读 · 2023年5月10日

PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information

Arxiv

0+阅读 · 2023年5月10日

Optical Aberration Correction in Postprocessing using Imaging Simulation

Arxiv

0+阅读 · 2023年5月10日

On near-redundancy and identifiability of parametric hazard regression models under censoring

Arxiv

0+阅读 · 2023年5月9日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年5月9日

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Arxiv

0+阅读 · 2023年5月9日

Patch-DrosoNet: Classifying Image Partitions With Fly-Inspired Models For Lightweight Visual Place Recognition

Arxiv

0+阅读 · 2023年5月9日

DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition

Arxiv

0+阅读 · 2023年5月9日

ICASSP 2023 Deep Noise Suppression Challenge

Arxiv

1+阅读 · 2023年5月9日

相关基金

星形胶质细胞内源性PLD正性调控树突的发育

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务概率视觉语义模型的图像场景理解

国家自然科学基金

2+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

面向文本信息安全的类别语义模型分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于自然手势的三维交互技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

家蚕传染性软化病病毒翻译机制及侵染机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

实时双模态自动图像软标注与多关键词检索

国家自然科学基金

0+阅读 · 2009年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语文语转换中语义与表现力联合建模

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员