We initiate the study of active learning algorithms for classifying strategic agents. Active learning is a well-established framework in machine learning in which the learner selectively queries labels, often achieving substantially higher accuracy and efficiency than classical supervised methods-especially in settings where labeling is costly or time-consuming, such as hiring, admissions, and loan decisions. Strategic classification, however, addresses scenarios where agents modify their features to obtain more favorable outcomes, resulting in observed data that is not truthful. Such manipulation introduces challenges beyond those in learning from clean data. Our goal is to design active and noise-tolerant algorithms that remain effective in strategic environments-algorithms that classify strategic agents accurately while issuing as few label requests as possible. The central difficulty is to simultaneously account for strategic manipulation and preserve the efficiency gains of active learning. Our main result is an algorithm for actively learning linear separators in the strategic setting that preserves the exponential improvement in label complexity over passive learning previously obtained only in the non-strategic case. Specifically, for data drawn uniformly from the unit sphere, we show that a modified version of the Active Perceptron algorithm [DKM05,YZ17] achieves excess error $ε$ using only $\tilde{O}(d \ln \frac{1}ε)$ label queries and incurs at most $\tilde{O}(d \ln \frac{1}ε)$ additional mistakes relative to the optimal classifier, even in the nonrealizable case, when a $\tildeΩ(ε)$ fraction of inputs have inconsistent labels with the optimal classifier. The algorithm is computationally efficient and, under these distributional assumptions, requires substantially fewer label queries than prior work on strategic Perceptron [ABBN21].
翻译:我们首次研究了用于分类策略性智能体的主动学习算法。主动学习是机器学习中一个成熟的框架,其中学习器选择性地查询标签,通常比传统的监督方法获得显著更高的准确性和效率——特别是在标签标注成本高昂或耗时的场景中,例如招聘、录取和贷款决策。然而,策略分类处理的是智能体修改其特征以获得更有利结果的情况,导致观察到的数据不真实。这种操纵引入了超越从干净数据中学习所面临的挑战。我们的目标是设计在策略环境中仍保持有效的主动且抗噪声的算法——这些算法能够准确分类策略性智能体,同时尽可能减少标签请求的数量。核心难点在于同时考虑策略性操纵并保持主动学习的效率增益。我们的主要成果是一种在策略设置下主动学习线性分离器的算法,该算法保持了在标签复杂度上相对于被动学习的指数级改进,这一改进先前仅在非策略情况下获得。具体而言,对于从单位球面上均匀抽取的数据,我们证明了一种改进版的主动感知机算法 [DKM05,YZ17] 能够仅使用 $\tilde{O}(d \ln \frac{1}ε)$ 次标签查询实现超额误差 $ε$,并且相对于最优分类器最多产生 $\tilde{O}(d \ln \frac{1}ε)$ 次额外错误,即使在不可实现的情况下,当有 $\tildeΩ(ε)$ 比例的输入与最优分类器的标签不一致时也是如此。该算法计算效率高,并且在上述分布假设下,所需的标签查询数量显著少于先前关于策略感知机的研究 [ABBN21]。