We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant.
翻译:手语翻译员认为,非手语信号,如面部表情和手势等非手语信号,在传达准确含义方面起着重要作用。考虑到手语的语言特征,我们提议的框架是首次和独特的尝试,目的是建立一个包含手动和非手动方式的多式手语增强功能(下称KoSLA Cample)的多式手语增强功能(下称KoSLA Cample)(以下简称KoSLA Cample),我们的方法将文字转换成附加说明的形式,尽量减少信息损失。手语由手动信号、非手语信号和标志性特征组成。手语由手语组成,手语由手语组成。手语由人工和非手语形式组成,我们的方法将文本转换成附加说明的文本。手语由手语组成,在医院环境中展示了自信的结果,显示用增强的数据集提高了性能。为了克服数据稀缺性,我们采用了非手语地名替换技术,以提高翻译模型和可用数据的效率,同时保持手语的语的语的语文和语结构结构。为了实验性支持,我们核实数据增强技术的效用,通过在正常句和非手语描述器上执行翻译任务和手语说明之间的翻译任务。结果令人信服力。结果令人信服。结果令人信服。