We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from non-strategic online classification. For instance, whereas in the non-strategic case, a mistake bound of $\ln|H|$ is achievable via the halving algorithm when the target function belongs to a known class $H$, we show that no deterministic algorithm can achieve a mistake bound $o(\Delta)$ in the strategic setting, where $\Delta$ is the maximum degree of the manipulation graph (even when $|H|=O(\Delta)$). We obtain an algorithm achieving mistake bound $O(\Delta\ln|H|)$. We also extend this to the agnostic setting and obtain an algorithm with a $\Delta$ multiplicative regret, and we show no deterministic algorithm can achieve $o(\Delta)$ multiplicative regret. Next, we study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences. In the first model, at each round the learner deterministically chooses a probability distribution over classifiers inducing expected values on each vertex (probabilities of being classified as positive), which the strategic agents respond to. We show that any learner in this model has to suffer linear regret. On the other hand, in the second model, while the adversary who selects the next agent must respond to the learner's probability distribution over classifiers, the agent then responds to the actual hypothesis classifier drawn from this distribution. Surprisingly, we show this model is more advantageous to the learner, and we design randomized algorithms that achieve sublinear regret bounds against both oblivious and adaptive adversaries.
翻译:我们研究了在线二进制分类的问题, 战略代理商可以以预定义的方式, 以操纵图为模型, 来操纵其可见的特性, 以便获得积极的分类。 我们展示了这一设置与非战略在线分类有根本的不同。 例如, 在非战略情况下, 当目标函数属于已知的等级 $H$ 时, 以$ $ 折算算算法来完成一个一半的算法, 当目标函数属于已知的等级 $H 美元时, 我们发现, 在战略环境中, 任何确定式算法都不可能实现一个错误 $o (\ Delta) 的绑定, 美元是操纵图的最大程度( 即使当 $H<unk> O (\ Delta) 美元 (\ Delta) 美元) 。 我们得到了一个与非战略在线在线分类有根本差别的设置 。 我们得到了一个算法算法, 将这个算法结合到 $\ Delta $ 倍的算法 。 我们显示, 确定模式的计算方法不会达到 $ (\ Deltaildrideal drideal devode) 。</s>