具有公平性限制的快速选择功能 (Fast Feature Selection with Fairness Constraints)

We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.

翻译：我们研究为模型构建选择最佳功能的根本问题。这个问题在计算上对大型数据集具有挑战性, 即使使用贪婪的算法变量。为了应对这一挑战, 我们将最近为贪婪的子模块函数前方选择而提议的适应性查询模型扩大到更快速的 Orthogonal 匹配追寻非子模块函数范式。提议的算法在适应性查询模型中实现了指数化快速平行运行时间, 比例比先前的工作要好得多。此外, 我们的扩展允许使用下限限制, 可用于将某些公平标准编码到特征选择过程中。我们证明基于标准假设的算法具有很强的近似性保证。这些保证适用于许多参数模型, 包括通用线性模型。最后, 我们从经验上表明, 拟议的算法在功能选择、真实世界和合成数据集方面, 与最先进的特征选择技术相比, 具有优势。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日