在寻找阿尔茨海默氏病生物标志时,结合群集分析高维、相关临床数据,组合特征选择组合,以寻找阿尔茨海默氏病生物标志 (Ensemble feature selection with clustering for analysis of high-dimensional, correlated clinical data in the search for Alzheimer's disease biomarkers)

2022 年 7 月 6 日

Ensemble feature selection with clustering for analysis of high-dimensional, correlated clinical data in the search for Alzheimer's disease biomarkers

翻译：在寻找阿尔茨海默氏病生物标志时,结合群集分析高维、相关临床数据,组合特征选择组合,以寻找阿尔茨海默氏病生物标志

Annette Spooner,Gelareh Mohammadi,Perminder S. Sachdev,Henry Brodaty,Arcot Sowmya

from arxiv, 16 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2207.01822

Healthcare datasets often contain groups of highly correlated features, such as features from the same biological system. When feature selection is applied to these datasets to identify the most important features, the biases inherent in some multivariate feature selectors due to correlated features make it difficult for these methods to distinguish between the important and irrelevant features and the results of the feature selection process can be unstable. Feature selection ensembles, which aggregate the results of multiple individual base feature selectors, have been investigated as a means of stabilising feature selection results, but do not address the problem of correlated features. We present a novel framework to create feature selection ensembles from multivariate feature selectors while taking into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step. These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood. Our results show a marked improvement in the stability of features selected over the models without clustering, and the features selected by these models are in keeping with the findings in the AD literature.

翻译：卫生保健数据集通常包含高度关联的特征群,例如同一生物系统的特征。当将特征选择应用于这些数据集以确定最重要的特征时,某些多变量特征选择器因相关特征而固有的偏差使得这些方法难以区分重要和不相关的特征和特征选择过程的结果,因此这些方法很难区分重要和不相关的特征和特征选择过程的结果。特征选择组群综合了多个个人基本特征选择器的结果,作为稳定特征选择结果的一种手段,已被调查为一种手段,但并未解决关联特征问题。我们提出了一个新框架,用于从多个变量选择器中创建特征聚合物,同时考虑到相关特征组群产生的偏差,同时在处理前的步骤中使用聚合性等级组合组合。这些方法适用于阿尔茨海默氏病研究中的两个真实世界数据集(AD),这是一种进步性神经降解性疾病,没有治愈,而且尚未完全理解。我们的结果显示,在不组合的情况下,对模型选择的特征的稳定性有了显著改善,这些模型所选定的特征与AD文献中的调查结果是一致的。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日