The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.
翻译:人工智能自动翻译手语成为口语已经取得了长足的发展,但是目前为止使用的数据集规模和领域类型有限。我们的研究通过提供 How2Sign 数据集中的第一个基准结果来推动技术的发展。我们使用 Transformer 和 I3D 视频特征进行训练,并以减少的 BLEU 分数作为验证指标,而非通常使用的 BLEU 分数。我们报告了8.03 的 BLEU 分数结果,并公开了首个开源实现,以促进进一步的技术进步。