We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark, with consistent improvements in ISLR regardless of the underlying prediction model architecture. This work has the potential to accelerate linguistic research in the domain of signed languages and reduce communication barriers between deaf and hearing people.
翻译:我们用美国手语声学研究的洞察力来训练孤立手语识别模型(ISLR),这是向自动手语理解迈出的一步。我们的关键洞察力是明确承认声学在手语制作中的作用,以便实现比目前不考虑手语声学的工作更准确的ISLR。我们训练ISLR模型,这些模型对产生单一信号的签名人进行估计,不仅预测其符号特征,而且预测其声学特征,例如手动声波。这些辅助预测导致在WLASL基准的签名识别准确性方面获得近9%的绝对收益,而不论基本的预测模型结构如何,ISLR的特征都得到了一致的改进。 这项工作有可能加速在签名语言领域进行语言研究,并减少聋人和听力人之间的沟通障碍。