We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.
翻译:我们研究为模型构建选择最佳功能的根本问题。 这个问题在计算上对大型数据集具有挑战性, 即使使用贪婪的算法变量。 为了应对这一挑战, 我们将最近为贪婪的子模块函数前方选择而提议的适应性查询模型扩大到更快速的 Orthogonal 匹配追寻非子模块函数范式。 提议的算法在适应性查询模型中实现了指数化快速平行运行时间, 比例比先前的工作要好得多。 此外, 我们的扩展允许使用下限限制, 可用于将某些公平标准编码到特征选择过程中。 我们证明基于标准假设的算法具有很强的近似性保证。 这些保证适用于许多参数模型, 包括通用线性模型。 最后, 我们从经验上表明, 拟议的算法在功能选择、 真实世界 和合成数据集方面, 与最先进的特征选择技术相比, 具有优势。