Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the largest Isolated Sign Language Recognition (ISLR) dataset to date, collected with consent and containing 83,912 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their own webcam with the aim of retrieving matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset greatly advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving, for instance, 62% accuracy and a recall-at-10 of 90%, evaluated entirely on videos of users who are not present in the training or validation sets. An accessible PDF of this article is available at https://aashakadesai.github.io/research/ASL_Dataset__arxiv_.pdf
翻译:手语作为一种约有7000万聋人使用的主要语言。然而,大多数通讯技术仅支持口语和书写语言,导致各种不公。为了解决这个问题,我们发布了ASL Citizen。这是目前最大的独立手语识别(ISLR)数据集,已经征得允许,包含83,912个视频,由52位手语者拍摄了2,731个不同的手势,并在各种环境中进行了拍摄。我们建议将这个数据集用于美国手语(ASL)手语词典检索,在其中,用户向自己的网络摄像头演示手语,旨在从词典中检索匹配的手势。我们展示了使用我们的数据集训练监督式机器学习分类器将大大推动术语词典检索相关的指标的最新发展,例如,在完全由不在训练或验证集中的用户的视频评估的情况下,达到了62%的准确率和90%的recall-at-10。该文章的可访问PDF可在https://aashakadesai.github.io/research/ASL_Dataset__arxiv_.pdf上找到。