This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses. Previous sequence-to-sequence language models trained with paired sentence-gloss data often fail to capture the rich connections between the two distinct languages, leading to unsatisfactory transcriptions. We observe that despite different grammars, glosses effectively simplify sentences for the ease of deaf communication, while sharing a large portion of vocabulary with sentences. This has motivated us to implement glossification by executing a collection of editing actions, e.g. word addition, deletion, and copying, called editing programs, on their natural spoken language counterparts. Specifically, we design a new neural agent that learns to synthesize and execute editing programs, conditioned on sentence contexts and partial editing results. The agent is trained to imitate minimal editing programs, while exploring more widely the program space via policy gradients to optimize sequence-wise transcription quality. Results show that our approach outperforms previous glossification models by a large margin.
翻译:这项工作旨在为聋人(重听者)社区改写自然语言句子,以命令手语句句子。以前通过配对句子标语数据培训的顺序到顺序语言模型往往不能捕捉两种不同语言之间的丰富联系,导致不令人满意的抄录。我们注意到,尽管语法不同,但为了方便聋人交流,它有效地简化了句子,同时分享了很大一部分词汇和句子。这促使我们通过执行一系列编辑行动来实施缩写,例如,增加字、删除和复制,称作编辑程序,以自然口语对应方为对象。具体地说,我们设计了一个新的神经剂,学习合成和执行编辑程序,以句子背景和部分编辑结果为条件。我们训练了一种仿制最起码的编辑程序,同时通过政策梯度更广泛地探索程序空间,以优化顺序顺序的抄录质量。结果显示,我们的方法大大优于以前的缩写模型。