High-dimensional imbalanced data poses a machine learning challenge. In the absence of sufficient or high-quality labels, unsupervised feature selection methods are crucial for the success of subsequent algorithms. Therefore, there is a growing need for unsupervised feature selection algorithms focused on imbalanced data. Thus, we propose a Marginal Laplacian Score (MLS) a modification of the well-known Laplacian Score (LS) to be better suited for imbalance data. We introduce an assumption that the minority class or anomalous appear more frequently in the margin of the features. Consequently, MLS aims to preserve the local structure of the data set's margin. As MLS is better suited for handling imbalanced data, we propose its integration into modern feature selection methods that utilize the Laplacian score. We integrate the MLS algorithm into the Differentiable Unsupervised Feature Selection (DUFS), resulting in DUFS-MLS. The proposed methods demonstrate robust and improved performance on synthetic and public data sets.
翻译:暂无翻译