The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos. Our approach employs 3D convolutional neural network representations with iterative temporal segment refinement to resolve ambiguities between sign boundary cues. We demonstrate the effectiveness of our approach on the BSLCORPUS, PHOENIX14 and BSL-1K datasets, showing considerable improvement over the prior state of the art and the ability to generalise to new signers, languages and domains.
翻译:这项工作的目的是确定连续手语视频中标志之间的时间边界位置。我们的方法是使用3D进化神经网络显示器,并进行迭代时间段的完善,以解决标志边界提示之间的模糊性。我们展示了我们关于BSLCORPUS、PHOENIX14和BSL-1K数据集的方法的有效性,表明比以往的先进水平以及普及新的签字者、语言和域的能力有了相当大的改进。