机器从手语到口语翻译：现状和挑战 (Machine Translation from Signed to Spoken Languages: State of the Art and Challenges)

from arxiv, This is the version of the article submitted to peer review to Universal Access in the Information Society. Please refer to "De Coster, M., Shterionov, D., Van Herreweghe, M. et al. Machine translation from signed to spoken languages: state of the art and challenges. Univ Access Inf Soc (2023)." for the published and updated version

Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular - the majority of scientific papers on the topic of sign language translation have been published in the past three years - we provide an overview of the state of the art as well as some required background in the different related disciplines. We give a high-level introduction to sign language linguistics and machine translation to illustrate the requirements of automatic sign language translation. We present a systematic literature review to illustrate the state of the art in the domain and then, harking back to the requirements, lay out several challenges for future research. We find that significant advances have been made on the shoulders of spoken language machine translation research. However, current approaches are often not linguistically motivated or are not adapted to the different input modality of sign languages. We explore challenges related to the representation of sign language data, the collection of datasets, the need for interdisciplinary research and requirements for moving beyond research, towards applications. Based on our findings, we advocate for interdisciplinary research and to base future research on linguistic analysis of sign languages. Furthermore, the inclusion of deaf and hearing end users of sign language translation applications in use case identification, data collection and evaluation is of the utmost importance in the creation of useful sign language translation models. We recommend iterative, human-in-the-loop, design and development of sign language translation models.

翻译：自动从手语到口语的翻译是交叉学科的研究领域，交汇于计算机视觉、机器翻译和语言学。然而，该领域的研究主要由计算机科学家独立开展。随着该领域变得越来越受欢迎——过去三年中，关于手语翻译的大部分科学论文已经发表——我们提供了现状的概述，以及不同相关学科所需的背景知识。我们对手语语言学和机器翻译进行了高层次的介绍，以说明自动手语翻译的要求。我们进行了系统的文献综述，以说明该领域的现状，然后回溯到要求，提出了未来研究的几个挑战。我们发现，在口语机器翻译研究的基础上取得了重大进展。然而，当前的方法往往不具备语言学动机，或者没有适应手语的不同输入模式。我们探讨了与手语数据表示、数据集收集、跨学科研究和超越研究各方面的要求相关的挑战。基于我们的发现，我们主张进行跨学科研究，并以手语的语言分析为基础进行未来的研究。此外，在使用情况识别、数据收集和评估中，将手语翻译应用的聋人和听障终端用户纳入考虑是非常重要的，以创建有用的手语翻译模型。我们建议使用人机交互的迭代式设计和开发手语翻译模型。