The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.
翻译:翻译技术对手语进行自动翻译的进展大多以数据集规模有限、领域狭窄为基准。本研究通过提供第一个大范围数据集How2Sign的基准结果来推进技术进步。我们使用基于I3D视频特征的Transformer进行训练,使用缩减的BLEU作为参考度量进行验证,而非广泛使用的BLEU分数。我们报告了8.03的BLEU分数结果,同时发布了开源实现版本,以促进更进一步的发展。