State-of-the-art sign language generation frameworks lack expressivity and naturalness which is the result of only focusing manual signs, neglecting the affective, grammatical and semantic functions of facial expressions. The purpose of this work is to augment semantic representation of sign language through grounding facial expressions. We study the effect of modeling the relationship between text, gloss, and facial expressions on the performance of the sign generation systems. In particular, we propose a Dual Encoder Transformer able to generate manual signs as well as facial expressions by capturing the similarities and differences found in text and sign gloss annotation. We take into consideration the role of facial muscle activity to express intensities of manual signs by being the first to employ facial action units in sign language generation. We perform a series of experiments showing that our proposed model improves the quality of automatically generated sign language.
翻译:最先进的手语生成框架缺乏表达性和自然性,这是仅仅集中手动符号的结果,忽视了面部表达方式的感性、语法和语义功能。 这项工作的目的是通过面部表达方式增加手语的语义表达方式。 我们研究了对文本、光条和面部表达方式进行建模对手语生成系统性能的影响。 特别是,我们提议了一个双倍电解器变异器,通过捕捉文本和符号符号记号中发现的异同和面部表达方式,能够产生手动符号和面部表达方式。 我们考虑到面部肌肉活动的作用,通过在手语生成过程中首先使用面部动作单位来表达手语的强度。 我们进行了一系列实验,表明我们提议的模型提高了自动生成手语的质量。