Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.
翻译:由于缺少标签的粗略文本平行数据,将手语符号翻译为口语文本,这具有挑战性,因为缺少贴有标签的粗略文本平行数据,因此很难将手语符号翻译为口语文本。 背面翻译(BT)通过将内部口语原文翻译为手语符号,生成假单数数据,用于缓解数据稀缺问题;然而,缺乏大规模高品质域域口语文本数据限制了BT的效果。 本文中,为了克服限制,我们建议采用快速的域域域文本生成(PGEN)方法来生成大型口语文本数据。具体地说,PGEN随机将原主语文本数据中的句子拼拼贴,以促使预先培训的语言模式(即GPT-2)以类似的方式生成口语文本文本。 手语文本翻译的三个基准的实验结果显示,PGEN生成的口语文本比比较方法要差得多。此外,PGEN生成的口语文本规模越大,BT技术方法可以进一步提升,促进我们将来的实地研究。