按模型序列组群选择成本敏感的变量 (Cost-sensitive Selection of Variables by Ensemble of Model Sequences)

Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. This is a fairly new class of problems in the area of cost-sensitive learning. A few attempts have been made to incorporate costs in combining and selecting measures. However, existing studies either do not strictly enforce a budget constraint, or are not the `most' cost effective. With a focus on classification problem, we propose a computationally efficient approach that could find a near optimal model under a given budget by exploring the most `promising' part of the solution space. Instead of outputting a single model, we produce a model schedule -- a list of models, sorted by model costs and expected predictive accuracy. This could be used to choose the model with the best predictive accuracy under a given budget, or to trade off between the budget and the predictive accuracy. Experiments on some benchmark datasets show that our approach compares favorably to competing methods.

翻译：在许多系统性能衡量尺度上,许多应用都要求收集关于不同变量或计量的数据。我们将这些变量或计量方法广义地称为措施或变量。在每项措施上,数据收集通常都有成本,因此最好考虑建模措施的成本。这是成本敏感学习领域一个相当新的问题类别。在合并和选择措施时,曾尝试将成本纳入其中。然而,现有的研究不是严格地强制实行预算限制,或不是“最有成本效益”的。在侧重于分类问题时,我们建议一种计算效率高的方法,在特定预算下,通过探索解决方案空间中最“最有前途”的部分,可以找到一个接近最佳的模式。我们提出一个模型时间表,而不是输出一个单一模型,一个模型清单,按模型成本和预期的准确性进行分类。这可用于选择在特定预算下具有最佳预测性准确性的模式,或者在预算与预测性准确性之间进行交换。关于某些基准数据集的实验表明,我们的方法比竞争性的方法要好。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日