Sign language is a visual language that is used by deaf or speech impaired people to communicate with each other. Sign language is always performed by fast transitions of hand gestures and body postures, requiring a great amount of knowledge and training to understand it. Sign language recognition becomes a useful yet challenging task in computer vision. Skeleton-based action recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. However, skeleton-based recognition can hardly be applied to sign language recognition tasks, majorly because skeleton data contains no indication of hand gestures or facial expressions. Inspired by the recent development of whole-body pose estimation \cite{jin2020whole}, we propose recognizing sign language based on the whole-body key points and features. The recognition results are further ensembled with other modalities of RGB and optical flows to improve the accuracy further. In the challenge about isolated sign language recognition hosted by ChaLearn using a new large-scale multi-modal Turkish Sign Language dataset (AUTSL). Our method achieved leading accuracy in both the development phase and test phase. This manuscript is a fact sheet version. Our workshop paper version will be released soon. Our code has been made available at https://github.com/jackyjsy/CVPR21Chal-SLR
翻译:手势和身体姿势总是通过手势和面部表达方式的快速转换来进行手势和身体姿势的手势,这需要大量的知识和培训才能理解手势。 手势语言的识别在计算机愿景中是一项有用但富有挑战性的任务。 以Skeleton为基础的行动识别正在日益普及,因为它可以进一步与基于 RGB-D 的基于RGB-D 的方法结合,以达到最先进的性能。 然而,基于骨架的识别很难用于手势识别任务,主要是因为骨架数据没有显示手势或面部表情。我们受最近全体的动态所启发,提出了估计\cite{jin2020wall},我们提议承认基于全体关键点和特征的手势语言。 承认结果进一步与基于RGB和光学流的其他模式相结合,以进一步提高准确性能。 在ChaLearn主持、使用新的大型多式土耳其手势信号语言数据集(AUTSL)的孤立手势语言识别的挑战中,我们的方法在开发阶段和测试阶段都实现了精准性。 我们的纸式的版本的版本。