Sign Language Production (SLP) is a challenging task, given the limited resources available and the inherent diversity within sign data. As a result, previous works have suffered from the problem of regression to the mean, leading to under-articulated and incomprehensible signing. In this paper, we propose using dictionary examples to create expressive sign language sequences. However, simply concatenating the signs would create robotic and unnatural sequences. Therefore, we present a 7-step approach to effectively stitch the signs together. First, by normalising each sign into a canonical pose, cropping and stitching we create a continuous sequence. Then by applying filtering in the frequency domain and resampling each sign we create cohesive natural sequences, that mimic the prosody found in the original data. We leverage the SignGAN model to map the output to a photo-realistic signer and present a complete Text-to-Sign (T2S) SLP pipeline. Our evaluation demonstrates the effectiveness of this approach, showcasing state-of-the-art performance across all datasets.
翻译:暂无翻译