In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a $p$-dimensional feature space, this procedure requires fitting only the full model and evaluating $p+1$ models, as opposed to the traditional requirement of fitting and evaluating $2^p$ models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.
翻译:在受监督的参数模型中,我们引入了电子价值的概念。电子价值是一种数量标度,它代表了在一组特征方面受过训练的模型中参数估计抽样分布的近距离。在一般条件下,电子价值的排序将包含所有基本特征的模型与不包含基本特征的模型区分开来。电子价值适用于一系列广泛的参数模型。我们使用数据深度和基于快速抽样的算法来实施使用电子价值的特征选择程序,提供一致性结果。对于一个以美元计维特征空间,这一程序只需要安装完整的模型和评估$p+1美元模型,而不是传统的2美元模型的安装和评估要求。我们通过在几个模型设置以及合成和真实数据集中进行实验,确定电子价值方法是现有具体特征选择模式的一种有希望的一般替代方法。