Signers compose sign language phonemes that enable communication by combining phonological parameters such as handshape, orientation, location, movement, and non-manual features. Linguistic research often breaks down signs into their constituent parts to study sign languages and often a lot of effort is invested into the annotation of the videos. In this work we show how a single model can be used to recognise the individual phonological parameters within sign languages with the aim of either to assist linguistic annotations or to describe the signs for the sign recognition models. We use Danish Sign Language data set `Ordbog over Dansk Tegnsprog' to generate multiple data sets using pose estimation model, which are then used for training the multi-label Fast R-CNN model to support multi-label modelling. Moreover, we show that there is a significant co-dependence between the orientation and location phonological parameters in the generated data and we incorporate this co-dependence in the model to achieve better performance.
翻译:签名人制作手语电话,通过将手动形状、方向、位置、移动和非体格特征等声学参数结合起来,使交流成为能够进行交流的手语语音。语言研究往往将信号分解成其组成部分,以研究手语,而且往往在视频的注解方面投入大量精力。在这项工作中,我们展示了如何使用单一模型来识别手语中的个人声学参数,目的是协助语言说明或描述标志识别模型的符号。我们使用丹麦手语数据集“丹斯克Tegnsprog的Ordbog” 来生成多个数据集,使用图像估测模型,然后用于培训多标签快速R-CNN模型以支持多标签建模。此外,我们表明,在生成的数据中,方向和位置声学参数之间有很大的相互依存关系,我们将这种相互依存关系纳入模型,以取得更好的性能。