We address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost and misclassification cost is minimized. We describe the Value of Information Lattice (VOILA), an optimal and efficient feature subset acquisition framework. Unlike the common practice, which is to acquire features greedily, VOILA can reason with subsets of features. VOILA efficiently searches the space of possible feature subsets by discovering and exploiting conditional independence properties between the features and it reuses probabilistic inference computations to further speed up the process. Through empirical evaluation on five medical datasets, we show that the greedy strategy is often reluctant to acquire features, as it cannot forecast the benefit of acquiring multiple features in combination.
翻译:我们处理成本敏感的地物获取问题,在这种问题上,错误划分实例的成本昂贵,但预期的分类错误成本可以通过获得缺失特征的价值来降低。由于获得这些特征的成本也很高,因此我们的目标是获得正确的地物组,以便尽可能减少地物获取成本和错误分类成本的总和。我们描述了信息Lattice(VOILA)的价值,这是一个最佳和高效的地物子获取框架。不同于通常的做法,即贪婪地获取特征,VOILA可以与特征的子集来理解。VOILA通过发现和利用这些特征之间的有条件独立属性,有效地搜索可能的地物子组空间,并重新利用概率推算方法来进一步加快进程。我们通过对五个医疗数据集的经验评估,我们表明贪婪战略往往不愿意获得特征,因为它无法预测获得多重特征组合的好处。