Sign language recognition (SLR) facilitates communication between deaf and hearing communities. Deep learning based SLR models are commonly used but require extensive computational resources, making them unsuitable for deployment on edge devices. To address these limitations, we propose a lightweight SLR system that combines parallel bidirectional reservoir computing (PBRC) with MediaPipe. MediaPipe enables real-time hand tracking and precise extraction of hand joint coordinates, which serve as input features for the PBRC architecture. The proposed PBRC architecture consists of two echo state network (ESN) based bidirectional reservoir computing (BRC) modules arranged in parallel to capture temporal dependencies, thereby creating a rich feature representation for classification. We trained our PBRC-based SLR system on the Word-Level American Sign Language (WLASL) video dataset, achieving top-1, top-5, and top-10 accuracies of 60.85%, 85.86%, and 91.74%, respectively. Training time was significantly reduced to 18.67 seconds due to the intrinsic properties of reservoir computing, compared to over 55 minutes for deep learning based methods such as Bi-GRU. This approach offers a lightweight, cost-effective solution for real-time SLR on edge devices.
翻译:手语识别(SLR)促进了聋哑人群与听力正常人群之间的交流。基于深度学习的SLR模型虽被广泛使用,但需要大量计算资源,因此不适合部署在边缘设备上。为解决这些局限性,我们提出了一种轻量级SLR系统,该系统将并行双向储备池计算(PBRC)与MediaPipe相结合。MediaPipe能够实现实时手部跟踪并精确提取手部关节坐标,这些坐标作为PBRC架构的输入特征。所提出的PBRC架构由两个基于回声状态网络(ESN)的双向储备池计算(BRC)模块并行排列组成,以捕获时间依赖性,从而为分类创建丰富的特征表示。我们在词级美国手语(WLASL)视频数据集上训练了基于PBRC的SLR系统,分别达到了60.85%、85.86%和91.74%的top-1、top-5和top-10准确率。得益于储备池计算的内在特性,训练时间显著减少至18.67秒,而基于深度学习的方法(如Bi-GRU)则需要超过55分钟。该方法为在边缘设备上实现实时SLR提供了一种轻量级、高性价比的解决方案。