How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an excessive amount of labels. In particular, we focus on the task of selecting pre-trained classifiers, and propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection. In comparison to prior art, our algorithm does not assume a globally optimal model. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Our experiments on several benchmark classification datasets demonstrate the algorithm's effectiveness in terms of both regret and query complexity. Notably, to achieve the same accuracy, CAMS incurs less than 10% of the label cost when compared to the best online model selection baselines on CIFAR10.
翻译:如何收集最有用的标签, 学习以任意的多元数据流模式选择模式的政策? 在本文中, 我们将此任务发展成在线背景主动模式选择问题, 每回合的学习者将获得一个无标签的数据点和上下文。 目标是为任何特定环境输出最佳模式, 而不需要过多的标签。 特别是, 我们专注于选择预培训分类员的任务, 并提议一个背景活跃的模型选择算法( CAMS ), 该算法依赖于在适应模式选择的某个政策类别上界定的新颖的不确定抽样查询标准。 与以前的艺术相比, 我们的算法不采用全球最佳模式。 我们为在对抗和随机设置下的遗憾和查询复杂性提供严格的理论分析。 我们在几个基准分类数据集上进行的实验显示了算法在遗憾和查询复杂性方面的有效性。 值得注意的是, 为了达到同样的准确性, CAMS 与 CAR10 上的最佳在线模型选择基线相比, 所花费的标签成本不到10% 。