内容4 所有开放研究手语翻译数据集 (Content4All Open Research Sign Language Translation Datasets)

Computational sign language research lacks the large-scale datasets that enables the creation of useful reallife applications. To date, most research has been limited to prototype systems on small domains of discourse, e.g. weather forecasts. To address this issue and to push the field forward, we release six datasets comprised of 190 hours of footage on the larger domain of news. From this, 20 hours of footage have been annotated by Deaf experts and interpreters and is made publicly available for research purposes. In this paper, we share the dataset collection process and tools developed to enable the alignment of sign language video and subtitles, as well as baseline translation results to underpin future research.

翻译：计算手语研究缺乏能够产生有用的现实应用的大规模数据集,迄今为止,大多数研究仅限于小讨论领域(如天气预报)的原型系统,为了解决这一问题和推动实地向前推进,我们发布了六套数据集,其中包括190小时新闻领域的录像;从中,20小时的录像由聋人专家和口译员加注,供公众查阅,供研究之用;在本文件中,我们分享了数据集的收集过程和开发工具,以便协调手语视频和字幕,以及基线翻译结果,以支持今后的研究。

相关内容

数据集

关注 82

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【微软】人工智能系统课程

专知会员服务

87+阅读 · 2020年12月31日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日