The goal of this work is to detect and recognize sequences of letters signed using fingerspelling in British Sign Language (BSL). Previous fingerspelling recognition methods have not focused on BSL, which has a very different signing alphabet (e.g., two-handed instead of one-handed) to American Sign Language (ASL). They also use manual annotations for training. In contrast to previous methods, our method only uses weak annotations from subtitles for training. We localize potential instances of fingerspelling using a simple feature similarity method, then automatically annotate these instances by querying subtitle words and searching for corresponding mouthing cues from the signer. We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities. We employ a multi-stage training approach, where we make use of an initial version of our trained model to extend and enhance our training data before re-training again to achieve better performance. Through extensive evaluations, we verify our method for automatic annotation and our model architecture. Moreover, we provide a human expert annotated test set of 5K video clips for evaluating BSL fingerspelling recognition methods to support sign language research.
翻译:这项工作的目标是检测和识别用英国手语(BSL)的手指拼法签署的字母序列。 以前的手指拼法识别方法没有以BSL为重点,BSL的签名字母(例如双手而不是单手)与美国手语(ASL)非常不同,它们也使用手册说明来进行培训。 与以前的方法不同,我们的方法只是使用来自培训字幕的微弱说明。 我们用简单特征相似的方法将潜在的手指拼法案例本地化,然后通过查询字幕词和搜索签名人的相应口号提示自动注解这些情况。 我们提出一个适应这项任务的变换器结构,配有多种假冒CT损失功能,从替代说明的可能性中学习。 我们采用多阶段培训方法,在重新培训前先使用我们培训模式的初始版本来扩展和加强我们的培训数据,然后实现更好的业绩。 我们通过广泛的评估,核查我们自动注解的方法和我们的模型结构。 此外,我们提供一套由5K视频剪辑组成的人类专家测试,用于评估研究语言的标志性识别。