Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
翻译:与强弱标签者一起积极学习,认为这是一个切实可行的环境,我们可以在这个环境中接触成本昂贵但准确的标签,以及薄弱标签者提供的不准确但廉价的预测。我们研究了在必须做出决策的流化环境中的这一问题。我们设计了一个新型算法模板,即Weak Labeler活性封面(WL-AC),它能够强有力地利用低质量的低质量的模板来降低查询复杂性,同时保持理想的准确度。以前与弱标签者接触的积极学习算法学会了差异分类,它预测了弱标签与强标签者不同的地方;这要求以差异分类者(Zhang和Chaudhuri,2015年)的真实性为强的假设。WL-AC绕过这种 kextit(L-Reality)的假设,因此适用于许多现实世界情景,例如随机腐败的标签和差异分类者的高维度组(Textitilit{e.e.g.g.), 深神经网。此外,WL-AC明智地交易了评估薄弱标签的质量,同时充分利用虚弱标签的精度的精度,从而可以将精度的精度的精度转化为的精度的精度转换。