Efficient PAC learning of threshold functions is arguably one of the most important problems in machine learning. With the unprecedented growth of large-scale data sets, it has become ubiquitous to appeal to the crowd wisdom for data annotation, and the central problem that attracts a surge of recent interests is how one can learn the underlying hypothesis from the highly noisy crowd annotation while well-controlling the annotation cost. On the other hand, a large body of recent works have investigated the problem of learning with not only labels, but also pairwise comparisons, since in many applications it is easier to compare than to label. In this paper, we study the problem of PAC learning threshold functions from the crowd, where the annotators can provide (noisy) labels or pairwise comparison tags. We design a label-efficient algorithm that interleaves learning and annotation, which leads to a constant overhead of our algorithm (a notion that characterizes the query complexity). In contrast, a natural approach of annotation followed by learning leads to an overhead growing with the sample size.
翻译:在机器学习中,高效的PAC入门功能的学习可以说是最重要的问题之一。随着大规模数据集的空前增长,吸引人群对数据批注的智慧变得无处不在。 吸引近期兴趣的核心问题是,如何从高度吵闹的人群批注中学习基本假设,同时控制批注成本。另一方面,一大批近期著作不仅用标签调查学习问题,而且用对称比较调查问题,因为在许多应用中,与标签相比更容易比较。在本文中,我们研究了人群学习PAC入门功能的问题,在人群中,批注者可以提供( noisy)标签或对称比较标签。我们设计了一种贴合标签效率的算法,将学习和注注注相隔开,从而导致我们算法的经常性管理(这种概念是查询复杂性的特征 ) 。 相比之下, 自然的批注方法通过学习而导致与样本大小相仿的间接费用的增加。