The use of Automatic speech recognition (ASR) interfaces have become increasingly popular in daily life for use in interaction and control of electronic devices. The interfaces currently being used are not feasible for a variety of users such as those suffering from a speech disorder, locked-in syndrome, paralysis or people with utmost privacy requirements. In such cases, an interface that can identify envisioned speech using electroencephalogram (EEG) signals can be of great benefit. Various works targeting this problem have been done in the past. However, there has been limited work in identifying the frequency bands ($\delta, \theta, \alpha, \beta, \gamma$) of the EEG signal that contribute towards envisioned speech recognition. Therefore, in this work, we aim to analyze the significance of different EEG frequency bands and signals obtained from different lobes of the brain and their contribution towards recognizing envisioned speech. Signals obtained from different lobes and bandpass filtered for different frequency bands are fed to a spatio-temporal deep learning architecture with Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). The performance is evaluated on a publicly available dataset comprising of three classification tasks - digit, character and images. We obtain a classification accuracy of $85.93\%$, $87.27\%$ and $87.51\%$ for the three tasks respectively. The code for the implementation has been made available at https://github.com/ayushayt/ImaginedSpeechRecognition.
翻译:自动语音识别(ASR)界面的使用在日常生活中越来越受欢迎,用于互动和控制电子设备。目前使用的界面对于各种用户来说是不可行的,例如那些患有语音失常、锁定综合症、瘫痪或最隐私要求的人;在这种情况下,一个能够使用电子脑图信号识别语音预言的界面可能大有裨益。过去曾针对这一问题开展过各种工作。然而,在确定EEEG信号的频段((delta,\theta,\alpha,\beta,\gamma$)方面开展的工作有限,这些频段有助于预期的语音识别。因此,在这项工作中,我们旨在分析不同EEEG频带和从大脑不同部获得的信号的重要性,以及它们对语音认知的贡献。从不同地方和为不同频率波段过滤的频段获得的信号,但与Convoal Neur网络(N)和Long-Teral-delmaial main 信号的频段(LS7)的频段和长端-直径图像的代码,我们用三种可获取的代码来评估。