Sign Language Translation (SLT) is a promising technology to bridge the communication gap between the deaf and the hearing people. Recently, researchers have adopted Neural Machine Translation (NMT) methods, which usually require large-scale corpus for training, to achieve SLT. However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and the inaccuracy of the generated tokens. To alleviate this issue, we propose ConSLT, a novel token-level \textbf{Con}trastive learning framework for \textbf{S}ign \textbf{L}anguage \textbf{T}ranslation , which learns effective token representations by incorporating token-level contrastive learning into the SLT decoding process. Concretely, ConSLT treats each token and its counterpart generated by different dropout masks as positive pairs during decoding, and then randomly samples $K$ tokens in the vocabulary that are not in the current sentence to construct negative examples. We conduct comprehensive experiments on two benchmarks (PHOENIX14T and CSL-Daily) for both end-to-end and cascaded settings. The experimental results demonstrate that ConSLT can achieve better translation quality than the strong baselines.
翻译:手势语言翻译(SLT)是弥合聋人和听力人之间沟通差距的一个很有希望的技术。最近,研究人员采用了神经机器翻译(NMT)方法,通常需要大规模培训,才能达到SLT。然而,公开提供的SLT软件非常有限,造成象征性表达方式崩溃,产生的符号不准确。为了缓解这一问题,我们提议ConsLT, 是一个创新的象征性水平 \ textb{Contratsive 学习框架,用于 \ textbf{Sign\ textb{L}anguage\ textbf{Translation }, 通过在 SLT解码过程中纳入象征性水平对比学习, 学习有效的象征性表示。 具体地说, SCLT将不同辍学面具产生的每个符号和对应符号作为正配对, 然后随机地在词汇中抽取$K元的符号,用来构建否定的范例。 我们在两个基准( PHONIX14T 和 CSLV-D)上进行全面实验,通过将质量提升到最终的SLT。</s>