Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone's built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudible sound) from the smartphone's stereo speakers, which is then received by the smartphone's microphone and processed via a Convolutional Neural Network (CNN) for Hand Gesture Recognition. Data augmentation techniques are proposed to improve the detection accuracy and three dual-channel input fusion methods are compared. The first method merges the dual-channel audio as a single input spectrogram image. The second method adopts early fusion by concatenating the dual-channel spectrograms. The third method adopts late fusion by having two convectional input branches processing each of the dual-channel spectrograms and then the outputs are merged by the last layers. Our experimental results demonstrate a promising detection accuracy for the six gestures presented in our publicly available dataset with an accuracy of 93.58\% as a baseline.
翻译:由于现今普遍存在技术的大规模进步,新的普遍方法已进入实践,以提供新的创新特征和刺激对新的人类计算机相互作用的研究。本文件展示了一种手势识别方法,使用智能手机的内置扬声器和麦克风。提议的系统从智能手机的立体声扬声器中发出超声声器信号(无声),然后由智能话筒接收,然后通过“手动手势识别神经网络(CNN)”处理处理。提出了数据增强技术,以提高探测准确性,并比较三种双声道输入聚合方法。第一种方法将双声道声音合并为单一输入光谱图像。第二种方法通过对双声道光谱谱图进行整合,采用早期融合的方法。第三个方法采用延迟凝聚,方法是由两个对流输入分支分别处理双声波波谱图,然后由最后几层合并。我们的实验结果显示,我们作为公开数据基础提出的六个手势的精确度为93.58。