Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data. One key challenge in the usability of such data is the lack of sign annotations. Previous work exploiting such weakly-aligned data only found sparse correspondences between keywords in the subtitle and individual signs. In this work, we propose a simple, scalable framework to vastly increase the density of automatic annotations. Our contributions are the following: (1) we significantly improve previous annotation methods by making use of synonyms and subtitle-signing alignment; (2) we show the value of pseudo-labelling from a sign recognition model as a way of sign spotting; (3) we propose a novel approach for increasing our annotations of known and unknown classes based on in-domain exemplars; (4) on the BOBSL BSL sign language corpus, we increase the number of confident automatic annotations from 670K to 5M. We make these annotations publicly available to support the sign language research community.
翻译:最近,手语研究人员转而采用手语翻译电视广播,其中包括:(一) 连续签名的录像,和(二) 与音频内容相对应的字幕,作为随时可用的大规模培训数据来源;这些数据的可用性面临的一个关键挑战是缺乏标志说明。以前利用这种薄弱的数据时,在字幕和个人符号中只发现关键词之间很少的对应关系。在这项工作中,我们提出了一个简单、可扩展的框架,以大幅提高自动说明的密度。我们的贡献如下:(1) 我们通过使用同义词和字幕签名校对,大大改进了先前的注解方法;(2) 我们通过标识识别模型展示假标签作为标识识别方式的价值;(3) 我们建议一种新颖的方法,增加我们根据主页外观显示的已知和未知类的注释;(4) 在BABSL BSL 符号语言保护伞上,我们将自信自动说明的数量从670K增加到5M。我们将这些说明公开用于支持手语研究界。