Sign language recognition (SLR) aims to overcome the communication barrier for the people with deafness or the people with hard hearing. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based and RGB-based methods, but both the two lines of methods have their limitations. RGB-based approaches usually overlook the fine-grained hand structure, while Skeleton-based methods do not take the facial expression into account. In attempts to address both limitations, we propose a new framework named Spatial-temporal Part-aware network (StepNet), based on RGB parts. As the name implies, StepNet consists of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling. Particularly, without using any keypoint-level annotations, Part-level Spatial Modeling implicitly captures the appearance-based properties, such as hands and faces, in the feature space. On the other hand, Part-level Temporal Modeling captures the pertinent properties over time by implicitly mining the long-short term context. Extensive experiments show that our StepNet, thanks to Spatial-temporal modules, achieves competitive Top-1 Per-instance accuracy on three widely-used SLR benchmarks, i.e., 56.89% on WLASL, 77.2% on NMFs-CSL, and 77.1% on BOBSL. Moreover, the proposed method is compatible with the optical flow input, and can yield higher performance if fused. We hope that this work can serve as a preliminary step for the people with deafness.
翻译:手势语言识别( SLR) 旨在克服聋哑人或听力困难者的沟通障碍。 多数现有方法通常可以分为两行, 即Skeleton 和 RGB 两种方法, 但这两种方法都有其局限性。 RGB 方法通常忽略细微的手部结构, 而Skeleton 方法则不考虑面部表达。 为了解决这两个限制, 我们提议了一个以 RGB 部分为基础的新的框架, 名为时空部分网络(StepNet ) 。 正如这个名称所暗示的那样, StepNet 由两个模块组成: 部分空间建模和部分Timalal 模式。 特别是, 不使用任何关键点级别说明, 部分空间建模模式暗含基于外观的属性, 如手和脸。 另一方面, 部分的Timaloral 模型可以通过隐含地挖掘长时段背景来捕捉相关属性。 广泛的实验显示, 我们的 StepNet, 如果借助空间- 时间流流, IMFL 工作在SBS-1 上, 的S- boral- breal oral oral laimal lax supal lax the the the the the the the the the the Stepal lax lax the sal- lax lax the supal lax the sweal- plegal lax, laveal laveal laveal lax supal lax supal lax the sild.