Online recognition of gestures is critical for intuitive human-robot interaction (HRI) and further push collaborative robotics into the market, making robots accessible to more people. The problem is that it is difficult to achieve accurate gesture recognition in real unstructured environments, often using distorted and incomplete multisensory data. This paper introduces an HRI framework to classify large vocabularies of interwoven static gestures (SGs) and dynamic gestures (DGs) captured with wearable sensors. DG features are obtained by applying data dimensionality reduction to raw data from sensors (resampling with cubic interpolation and principal component analysis). Experimental tests were conducted using the UC2017 hand gesture dataset with samples from eight different subjects. The classification models show an accuracy of 95.6% for a library of 24 SGs with a random forest and 99.3% for 10 DGs using artificial neural networks. These results compare equally or favorably with different commonly used classifiers. Long short-term memory deep networks achieved similar performance in online frame-by-frame classification using raw incomplete data, performing better in terms of accuracy than static models with specially crafted features, but worse in training and inference time. The recognized gestures are used to teleoperate a robot in a collaborative process that consists in preparing a breakfast meal.
翻译:在线手势识别对于直观的人机交互(HRI)和进一步推动协作机器人进入市场,使机器人更多地服务人们至关重要。问题在于,很难在真实的非结构化环境中实现准确的手势识别,常常使用失真和不完整的多感官数据。本文介绍了一种HRI框架,用于对佩戴式传感器捕获的交织的静态手势(SGs)和动态手势(DGs)的大量词汇进行分类。通过对传感器原始数据(使用三次插值和主成分分析进行重采样)进行数据降维,可以获得DG特征。使用UC2017手势数据集进行了实验测试,其中包括来自八个不同受试者的样本。分类模型显示24个SGs库的随机森林的精度为95.6%,10个DGs的人工神经网络的精度为99.3%。这些结果与不同常用分类器相比同等或更有利。长短时记忆深度网络在使用原始不完整数据进行在线逐帧分类时实现了类似的性能,在准确性方面优于具有特殊制作特征的静态模型,但在培训和推断时间方面则表现较差。识别的手势用于在协作过程中通过遥控机器人制备早餐。