GA 用于选取电子环境专家组多种不同数据的特色数据 (GA for feature selection of EEG heterogeneous data)

The electroencephalographic (EEG) signals provide highly informative data on brain activities and functions. However, their heterogeneity and high dimensionality may represent an obstacle for their interpretation. The introduction of a priori knowledge seems the best option to mitigate high dimensionality problems, but could lose some information and patterns present in the data, while data heterogeneity remains an open issue that often makes generalization difficult. In this study, we propose a genetic algorithm (GA) for feature selection that can be used with a supervised or unsupervised approach. Our proposal considers three different fitness functions without relying on expert knowledge. Starting from two publicly available datasets on cognitive workload and motor movement/imagery, the EEG signals are processed, normalized and their features computed in the time, frequency and time-frequency domains. The feature vector selection is performed by applying our GA proposal and compared with two benchmarking techniques. The results show that different combinations of our proposal achieve better results in respect to the benchmark in terms of overall performance and feature reduction. Moreover, the proposed GA, based on a novel fitness function here presented, outperforms the benchmark when the two different datasets considered are merged together, showing the effectiveness of our proposal on heterogeneous data.

翻译：电子脑图(EEG)信号提供关于大脑活动和功能的高度信息性数据,然而,它们的异质性和高度多维性可能对其解释构成障碍。引入先验知识似乎是减轻高度维度问题的最佳选择,但可能会失去数据中存在的一些信息和模式,而数据异质性仍然是一个经常使一般化变得困难的未决问题。在本研究中,我们提议了一种基因算法(GA),用于选择特征,可采用监督或不受监督的方法加以使用。我们的提案考虑三种不同的健康功能,而不依赖专家知识。从两个公开提供的关于认知工作量和运动/模拟的数据集开始,EEEG信号在时间、频率和时空域中处理、正常化和计算其特征。特征矢量选择是通过应用我们的GA建议和两种基准技术进行的。结果显示,我们提案的不同组合在总体性能和特征降低的基准方面取得了更好的结果。此外,拟议的GA基于在这里介绍的新颖的健身功能,在两种不同数据集的合并时,在不同的数据群集中显示我们所考虑的变异性时,其基准比。

相关内容

特征选择

关注 5936

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

属性异质信息网络上的半监督双聚类

专知会员服务

30+阅读 · 2021年2月17日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

【AAAI2020-清华大学】高效的异构协同过滤推荐（Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation），张敏，马少平等

专知会员服务

61+阅读 · 2019年11月22日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日