Given $k$ pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query a label so that we can distinguish the best model from the rest while making a small number of queries? Answering this question has a profound impact on a range of practical scenarios. In this work, we design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round. Our algorithm can be used for online prediction tasks for both adversarial and stochastic streams. We establish several theoretical guarantees for our algorithm and extensively demonstrate its effectiveness in our experimental studies.
翻译:鉴于培训前的分类师和一系列未贴标签的数据实例,我们如何积极决定何时查询标签,以便在进行少量查询的同时区分最佳模型和其余模型?回答这一问题对一系列实际情景具有深远影响。在这项工作中,我们设计了在线选择性抽样方法,积极选择信息性实例,在任何回合中以高概率标出和输出最佳模型。我们的算法可用于对对立和随机流的在线预测任务。我们为我们的算法建立了若干理论保障,并在实验研究中广泛展示其有效性。