The Human-Machine Interaction (HMI) research field is an important topic in machine learning that has been deeply investigated thanks to the rise of computing power in the last years. The first time, it is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms. The aim of this paper is to build a symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN) to recognize cultural/anthropological Italian sign language gestures from videos. The CNN extracts important features that later are used by the RNN. With RNNs we are able to store temporal information inside the model to provide contextual information from previous frames to enhance the prediction accuracy. Our novel approach uses different data augmentation techniques and regularization methods from only RGB frames to avoid overfitting and provide a small generalization error.
翻译:人类-海洋相互作用(HMI)研究领域是机器学习的一个重要课题,由于过去几年中计算机能力上升,对它进行了深入调查。第一次,可以使用机器学习对图像和/或视频进行分类,而不是传统的计算机视觉算法。本文的目的是在进化神经网络(CNN)和经常性神经网络(RNN)之间建立共生关系,以承认视频中文化/人类学意大利手语的手语手势。CNN 提取了后来由RNN使用的重要特征。有了RNN,我们得以在模型中储存时间信息,从以前的框架中提供背景信息,以提高预测的准确性。我们的新办法使用不同的数据增强技术和正规化方法,从RGB框架中只使用不同的RGB框架来避免过度配置和提供小的概括错误。