We develop Few-Shot Learning models trained to recognize five or ten different dynamic hand gestures, respectively, which are arbitrarily interchangeable by providing the model with one, two, or five examples per hand gesture. All models were built in the Few-Shot Learning architecture of the Relation Network (RN), in which Long-Short-Term Memory cells form the backbone. The models use hand reference points extracted from RGB-video sequences of the Jester dataset which was modified to contain 190 different types of hand gestures. Result show accuracy of up to 88.8% for recognition of five and up to 81.2% for ten dynamic hand gestures. The research also sheds light on the potential effort savings of using a Few-Shot Learning approach instead of a traditional Deep Learning approach to detect dynamic hand gestures. Savings were defined as the number of additional observations required when a Deep Learning model is trained on new hand gestures instead of a Few Shot Learning model. The difference with respect to the total number of observations required to achieve approximately the same accuracy indicates potential savings of up to 630 observations for five and up to 1260 observations for ten hand gestures to be recognized. Since labeling video recordings of hand gestures implies significant effort, these savings can be considered substantial.
翻译:我们开发了很少的热学习模型,以识别五、十种不同的动态手势,这些模型通过向每个手势提供一、二或五个示例而任意互换。所有模型都建在关系网(RN)的少热学习结构中,长期短期内存细胞构成骨干。这些模型使用从Jester数据集RGB视频序列中提取的手参照点,该序列经过修改,包含190种不同类型的手势。结果显示,在识别5个和10个动态手势时,精确度高达88.8%;10个动态手势时,精确度高达81.2%。研究还展示了使用少热学习方法而不是传统的深学习方法来探测动态手势的可能节省努力。节减是指在用新的手势而不是用少量Shot学习模型培训时所需的额外观测次数。对于达到大约相同精确度的观测次数的差额,表明,在5个和1260个动态手力手势动作的观测中,可能节省630个观测次数,最高为81.2%。10个重大手力力动作的观测结果得到确认。