Feature selection, as a vital dimension reduction technique, reduces data dimension by identifying an essential subset of input features, which can facilitate interpretable insights into learning and inference processes. Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples. In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees. The architecture of our algorithm consists of a feature scorer and a feature selector. The scorer trains a neural network (NN) to globally score all the features, and the selector adopts a dependent sub-NN to locally evaluate the representation abilities for selecting features. Further, we present algorithmic stability analysis and show that our algorithm has a performance guarantee via a generalization error bound. Extensive experimental results on real-world datasets demonstrate superior generalization performance of our proposed algorithm to strong baseline methods. Also, the properties revealed by our theoretical analysis and the stability of our algorithm-selected features are empirically confirmed.
翻译:作为重要维度减少技术的特性选择,通过确定一个输入特性的基本子集来降低数据维度,这有利于对学习和推断过程进行可解释的洞察力。 算法稳定性是算法对输入样品扰动敏感度的一个关键特征特征。 在本文中,我们提出一个创新的、不受监督的特性选择算法,以可验证的保证实现这种稳定性。 我们的算法结构包括一个特征计分器和一个特征选择器。 计分器训练一个神经网络(NN)来在全球评分所有特征, 选择器采用一个依赖的子NNN来在当地评估选择特征的表示能力。 此外, 我们提出算法稳定性分析, 并表明我们的算法通过一个总化错误具有性能保证。 真实世界数据集的广泛实验结果显示了我们提议的算法在强大的基线方法方面的超强性一般性能。 此外, 我们的理论分析所揭示的特性和我们算法所选特点的稳定性得到了经验上的确认。