Sign language is a beautiful visual language and is also the primary language used by speaking and hearing-impaired people. However, sign language has many complex expressions, which are difficult for the public to understand and master. Sign language recognition algorithms will significantly facilitate communication between hearing-impaired people and normal people. Traditional continuous sign language recognition often uses a sequence learning method based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM). These methods can only learn spatial and temporal features separately, which cannot learn the complex spatial-temporal features of sign language. LSTM is also difficult to learn long-term dependencies. To alleviate these problems, this paper proposes a multi-view spatial-temporal continuous sign language recognition network. The network consists of three parts. The first part is a Multi-View Spatial-Temporal Feature Extractor Network (MSTN), which can directly extract the spatial-temporal features of RGB and skeleton data; the second is a sign language encoder network based on Transformer, which can learn long-term dependencies; the third is a Connectionist Temporal Classification (CTC) decoder network, which is used to predict the whole meaning of the continuous sign language. Our algorithm is tested on two public sign language datasets SLR-100 and PHOENIX-Weather 2014T (RWTH). As a result, our method achieves excellent performance on both datasets. The word error rate on the SLR-100 dataset is 1.9%, and the word error rate on the RWTHPHOENIX-Weather dataset is 22.8%.
翻译:手语是一种美丽的视觉语言,也是说话和听力障碍者使用的主要语言。然而,手语有许多复杂的表达方式,公众难以理解和掌握。手语识别算法将大大便利听力障碍者和正常人之间的交流。传统的连续手语识别法通常使用基于进化神经网络(CNN)和长短期内存网络(LSTM)的序列学习方法。这些方法只能分别学习空间和时间特征,无法学习手语复杂的空间时空错误。LSTM也很难学习长期依赖性。为了缓解这些问题,本文提出多视角空间时空持续手语识别网络。手语识别算法将大大便利听力障碍者和正常人之间的交流。传统的手语识别算法由三部分组成。第一部分是多视角空间-时空功能提取网络(MSTM),它可以直接提取 RGB 和骨架数据的空间-时空特性;第二个方法是一个基于变换语言的手势语言网络,可以学习长期依赖性能;第三个是100天空数据系统数据转换率(SHO)的精度系统,这是SSSIMTERS-ral Ralalalalalalalalalalalervialalalal 。