使用无听觉声音和深层学习,对智能手机进行渗透式手势手势识别 (Pervasive Hand Gesture Recognition for Smartphones using Non-audible Sound and Deep Learning)

Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone's built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudible sound) from the smartphone's stereo speakers, which is then received by the smartphone's microphone and processed via a Convolutional Neural Network (CNN) for Hand Gesture Recognition. Data augmentation techniques are proposed to improve the detection accuracy and three dual-channel input fusion methods are compared. The first method merges the dual-channel audio as a single input spectrogram image. The second method adopts early fusion by concatenating the dual-channel spectrograms. The third method adopts late fusion by having two convectional input branches processing each of the dual-channel spectrograms and then the outputs are merged by the last layers. Our experimental results demonstrate a promising detection accuracy for the six gestures presented in our publicly available dataset with an accuracy of 93.58\% as a baseline.

翻译：由于现今普遍存在技术的大规模进步,新的普遍方法已进入实践,以提供新的创新特征和刺激对新的人类计算机相互作用的研究。本文件展示了一种手势识别方法,使用智能手机的内置扬声器和麦克风。提议的系统从智能手机的立体声扬声器中发出超声声器信号(无声),然后由智能话筒接收,然后通过“手动手势识别神经网络(CNN)”处理处理。提出了数据增强技术,以提高探测准确性,并比较三种双声道输入聚合方法。第一种方法将双声道声音合并为单一输入光谱图像。第二种方法通过对双声道光谱谱图进行整合,采用早期融合的方法。第三个方法采用延迟凝聚,方法是由两个对流输入分支分别处理双声波波谱图,然后由最后几层合并。我们的实验结果显示,我们作为公开数据基础提出的六个手势的精确度为93.58。