The electroencephalographic (EEG) signals provide highly informative data on brain activities and functions. However, their heterogeneity and high dimensionality may represent an obstacle for their interpretation. The introduction of a priori knowledge seems the best option to mitigate high dimensionality problems, but could lose some information and patterns present in the data, while data heterogeneity remains an open issue that often makes generalization difficult. In this study, we propose a genetic algorithm (GA) for feature selection that can be used with a supervised or unsupervised approach. Our proposal considers three different fitness functions without relying on expert knowledge. Starting from two publicly available datasets on cognitive workload and motor movement/imagery, the EEG signals are processed, normalized and their features computed in the time, frequency and time-frequency domains. The feature vector selection is performed by applying our GA proposal and compared with two benchmarking techniques. The results show that different combinations of our proposal achieve better results in respect to the benchmark in terms of overall performance and feature reduction. Moreover, the proposed GA, based on a novel fitness function here presented, outperforms the benchmark when the two different datasets considered are merged together, showing the effectiveness of our proposal on heterogeneous data.
翻译:电子脑图(EEG)信号提供关于大脑活动和功能的高度信息性数据,然而,它们的异质性和高度多维性可能对其解释构成障碍。引入先验知识似乎是减轻高度维度问题的最佳选择,但可能会失去数据中存在的一些信息和模式,而数据异质性仍然是一个经常使一般化变得困难的未决问题。在本研究中,我们提议了一种基因算法(GA),用于选择特征,可采用监督或不受监督的方法加以使用。我们的提案考虑三种不同的健康功能,而不依赖专家知识。从两个公开提供的关于认知工作量和运动/模拟的数据集开始,EEEG信号在时间、频率和时空域中处理、正常化和计算其特征。特征矢量选择是通过应用我们的GA建议和两种基准技术进行的。结果显示,我们提案的不同组合在总体性能和特征降低的基准方面取得了更好的结果。此外,拟议的GA基于在这里介绍的新颖的健身功能,在两种不同数据集的合并时,在不同的数据群集中显示我们所考虑的变异性时,其基准比。