野生生物拼法识别法的精细视觉关注方法 (A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild)

Fingerspelling in sign language has been the means of communicating technical terms and proper nouns when they do not have dedicated sign language gestures. Automatic recognition of fingerspelling can help resolve communication barriers when interacting with deaf people. The main challenges prevalent in fingerspelling recognition are the ambiguity in the gestures and strong articulation of the hands. The automatic recognition model should address high inter-class visual similarity and high intra-class variation in the gestures. Most of the existing research in fingerspelling recognition has focused on the dataset collected in a controlled environment. The recent collection of a large-scale annotated fingerspelling dataset in the wild, from social media and online platforms, captures the challenges in a real-world scenario. In this work, we propose a fine-grained visual attention mechanism using the Transformer model for the sequence-to-sequence prediction task in the wild dataset. The fine-grained attention is achieved by utilizing the change in motion of the video frames (optical flow) in sequential context-based attention along with a Transformer encoder model. The unsegmented continuous video dataset is jointly trained by balancing the Connectionist Temporal Classification (CTC) loss and the maximum-entropy loss. The proposed approach can capture better fine-grained attention in a single iteration. Experiment evaluations show that it outperforms the state-of-the-art approaches.

翻译：手语拼手指是手语中技术术语和适当名词的交流手段,当手语没有专门的手语手势时,则以手语拼字法作为传达技术术语和适当名词的手段。在与聋人互动时,自动识别手指拼字可以帮助消除沟通障碍。在手指拼字法中普遍存在的主要挑战在于手法的模棱两可。自动识别模型应该针对高等级之间视觉相似性和高等级内位变化的手势。现有的手指拼字识别研究大多侧重于在受控环境中收集的数据集。最近从社交媒体和在线平台收集的野生、有注释的手指拼字数据集大规模收集,捕捉现实世界情景中的挑战。在这项工作中,我们建议使用变形模型来精确地观察视觉关注野生数据集中顺序到顺序的预测任务。通过在连续背景环境中使用变形的图像框架(光流)的变化以及一个变形的编码模型来引起人们的注意。未加固的连续视频拼图式数据拼图式数据拼图式套图解在现实世界情景中遇到的挑战。我们建议采用最精细的视觉化的视觉数据分类方法,从而平衡了一次损失分析,从而平衡地分析了一次损失分析了一次损失分析方法。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

近期必读的5篇顶会CVPR 2021【行为识别】相关论文和代码

专知会员服务

60+阅读 · 2021年3月17日