Labeling is the cornerstone of supervised machine learning, which has been exploited in a plethora of various applications, with sign language recognition being one of them. However, such algorithms must be fed with a huge amount of consistently labeled data during the training process to elaborate a well-generalizing model. In addition, there is a great need for an automated solution that works with any nationally diversified sign language. Although there are language-agnostic transcription systems, such as the Hamburg Sign Language Notation System (HamNoSys) that describe the signer's initial position and body movement instead of the glosses' meanings, there are still issues with providing accurate and reliable labels for every real-world use case. In this context, the industry relies heavily on manual attribution and labeling of the available video data. In this work, we tackle this issue and thoroughly analyze the HamNoSys labels provided by various maintainers of open sign language corpora in five sign languages, in order to examine the challenges encountered in labeling video data. We also investigate the consistency and objectivity of HamNoSys-based labels for the purpose of training machine learning models. Our findings provide valuable insights into the limitations of the current labeling methods and pave the way for future research on developing more accurate and efficient solutions for sign language recognition.
翻译:标签是受监督的机器学习的基石,这种学习在众多的各种应用中被利用,手语识别是其中之一。然而,在培训过程中,这种算法必须用大量持续贴标签的数据来补充,以详细制定广泛推广的模式。此外,非常需要一种自动化的解决方案,与任何全国性多样化手语一起发挥作用。尽管存在着语言识别记录系统,例如汉堡手语符号语标注系统(HamnoSys),它描述了签名人最初的位置和身体运动,而不是光滑的含义,但是在为每个真实世界使用的案件提供准确可靠的标签方面仍然存在问题。在这方面,该行业在很大程度上依赖手动的归属和现有视频数据标签。在这项工作中,我们处理这一问题,并透彻分析由各种公开手语公司以五种手语提供的哈姆诺西语标注系统,以研究在标注视频数据时遇到的挑战。我们还调查了HamNoSys标签的连贯性和客观性,为当前语言标签的准确性解释方法提供了我们如何发展有价值的工具学习模型的标志。</s>