高效的手语识别系统和基于深学习和图像处理的数据集创建方法 (Efficient sign language recognition system and dataset creation method based on deep learning and image processing)

New deep-learning architectures are created every year, achieving state-of-the-art results in image recognition and leading to the belief that, in a few years, complex tasks such as sign language translation will be considerably easier, serving as a communication tool for the hearing-impaired community. On the other hand, these algorithms still need a lot of data to be trained and the dataset creation process is expensive, time-consuming, and slow. Thereby, this work aims to investigate techniques of digital image processing and machine learning that can be used to create a sign language dataset effectively. We argue about data acquisition, such as the frames per second rate to capture or subsample the videos, the background type, preprocessing, and data augmentation, using convolutional neural networks and object detection to create an image classifier and comparing the results based on statistical tests. Different datasets were created to test the hypotheses, containing 14 words used daily and recorded by different smartphones in the RGB color system. We achieved an accuracy of 96.38% on the test set and 81.36% on the validation set containing more challenging conditions, showing that 30 FPS is the best frame rate subsample to train the classifier, geometric transformations work better than intensity transformations, and artificial background creation is not effective to model generalization. These trade-offs should be considered in future work as a cost-benefit guideline between computational cost and accuracy gain when creating a dataset and training a sign recognition model.

翻译：每年创建新的深层次学习架构,在图像识别方面实现最新最先进的成果,并使人们相信,几年后,手语翻译等复杂任务将大大简化,成为听力障碍群体的一个通信工具。另一方面,这些算法仍然需要大量需要培训的数据,数据集创建过程费用昂贵、耗时且缓慢。因此,这项工作旨在调查数字图像处理和机器学习技术,这些技术可以有效地用来创建手语数据集。我们争论的是数据获取,例如获取或副抽样视频、背景类型、预处理和数据增强的第二年框架等复杂任务,作为听力障碍社区的通信工具。另一方面,这些算法仍然需要大量需要培训的数据,而数据集创建过程费用昂贵、耗时费和缓慢。因此,我们对于数字图像处理和机器学习技术,可以有效地用来创建手势语言数据集。我们争论的是数据采集的准确度是96.38%,对于包含更具挑战性背景条件的第二年率框架,背景、背景类型、背景类型、预处理和数据增强度等复杂任务任务,我们争论的是,使用革命性神经网络和对象探测工具来创建图像分类并比较比较根据统计测试成本结构进行精确度转换。 30个基础化工序系统创建模型,这些数据集是建立最精确的亚缩缩缩缩缩缩算。