To participate in the Isometric Spoken Language Translation Task of the IWSLT 2022 evaluation, constrained condition, AppTek developed neural Transformer-based systems for English-to-German with various mechanisms of length control, ranging from source-side and target-side pseudo-tokens to encoding of remaining length in characters that replaces positional encoding. We further increased translation length compliance by sentence-level selection of length-compliant hypotheses from different system variants, as well as rescoring of N-best candidates from a single system. Length-compliant back-translated and forward-translated synthetic data, as well as other parallel data variants derived from the original MuST-C training corpus were important for a good quality/desired length trade-off. Our experimental results show that length compliance levels above 90% can be reached while minimizing losses in MT quality as measured in BERT and BLEU scores.
翻译:为了参加IWSLT 2022年评估的Isoric Spoken语言翻译任务,限制条件,AppTek开发了基于英语到德语的神经变异器系统,并有各种长度控制机制,从源端和目标方假牙到剩余长度的编码以取代定位编码的字符的剩余长度。我们通过从不同系统变种中选择符合长度的假设以及从单一系统中重新组合最佳候选人,进一步提高了翻译的长度。 符合时间要求的回译和前传合成数据以及最初的 MuST-C培训教材中的其他平行数据变异对于质量良好/理想长度交换非常重要。我们的实验结果表明,在达到90%以上的长度合规水平的同时,可以尽可能减少按BERT和BLEU分数衡量的MT质量损失。