Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
翻译:地物选择有助于降低ML的数据采集成本,但标准的方法是用静态特征子集来培训模型。 这里, 我们考虑动态特征选择问题, 即基于现有信息按顺序查询模型的问题。 常通过强化学习( RL) 来解决外勤部的问题, 但我们探索了一种更简单的方法,即根据有条件的相互信息贪婪地选择特征。 这种方法在理论上是具有吸引力的,但需要获得数据分布的压轴功能, 因此我们开发了一种基于摊销优化的学习方法。 所提议方法显示, 当培训到最佳性和超越我们实验中众多现有特征选择方法时, 能够恢复贪婪政策, 从而证明它是一个简单但有力的方法来解决这个问题。