Existing work on sign language translation - that is, translation from sign language videos into sentences in a written language - has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings. In this paper, we introduce OpenASL, a large-scale American Sign Language (ASL) - English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers and is the largest publicly available ASL translation dataset to date. To tackle the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. The proposed techniques produce consistent and large improvements in translation quality, over baseline models based on prior work. Our data and code are publicly available at https://github.com/chevalierNoir/OpenASL
翻译:手语翻译的现有工作----即从手语视频翻译成书面语言的句子----主要侧重于(1) 在受控制的环境中收集的数据,或(2) 在特定领域收集的数据,这限制了对现实世界环境的适用性; 在本文中,我们引入了大规模美国手语(ASL)OpenASL -- -- 从在线视频网站(例如YouTube)收集的英文数据集; OpenASL包含来自200多个签名者的288小时多个域的ASL视频,是迄今为止最大的公开可用的ASL翻译数据集。为了应对现实环境中手语翻译的挑战,不设空白,我们提出了一套技术,包括签名搜索,作为培训前的借口,以及口腔和手部功能的融合。拟议技术在翻译质量方面产生了一致和大幅度的改进,超过了基于先前工作的基线模型。我们的数据和代码可在https://github.com/chevalierNoir/OpenASL上公开查阅。