Sign Language Translation (SLT) is a task that has not been studied relatively much compared to the study of Sign Language Recognition (SLR). However, the SLR is a study that recognizes the unique grammar of sign language, which is different from the spoken language and has a problem that non-disabled people cannot easily interpret. So, we're going to solve the problem of translating directly spoken language in sign language video. To this end, we propose a new keypoint normalization method for performing translation based on the skeleton point of the signer and robustly normalizing these points in sign language translation. It contributed to performance improvement by a customized normalization method depending on the body parts. In addition, we propose a stochastic frame selection method that enables frame augmentation and sampling at the same time. Finally, it is translated into the spoken language through an Attention-based translation model. Our method can be applied to various datasets in a way that can be applied to datasets without glosses. In addition, quantitative experimental evaluation proved the excellence of our method.
翻译:手语翻译( SLT) 是一项任务,与手语识别( SLR) 的研究相比,这项研究相对而言没有进行很多研究。 然而, SLR 是一项承认手语语言独特语法的研究,它不同于口语,有非残疾人难以解释的问题。 因此,我们将解决直接用手语翻译手语视频的问题。 为此,我们提议了一种新的关键点正常化方法,用于根据手语的骨干点进行翻译,并在手语翻译中使这些点严格正常化。它有助于通过根据身体各部分定制的正常化方法改进性能。 此外,我们提议了一个可同时进行框架增强和取样的随机框架选择方法。 最后,它将通过基于注意的翻译模式被翻译成口语。 我们的方法可以适用于各种数据集,可以适用于无损的数据集。 此外,定量实验性评估证明了我们方法的优异性。