A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks. In this paper we address this issue within the framework of PAC learning, focusing on the class of decision lists. Given that distributional assumptions are essential in the adversarial setting, we work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability. Our key results illustrate that the adversary's budget (that is, the number of bits it can perturb on each input) is a fundamental quantity in determining the sample complexity of robust learning. Our first main result is a sample-complexity lower bound: the class of monotone conjunctions (essentially the simplest non-trivial hypothesis class on the Boolean hypercube) and any superclass has sample complexity at least exponential in the adversary's budget. Our second main result is a corresponding upper bound: for every fixed $k$ the class of $k$-decision lists has polynomial sample complexity against a $\log(n)$-bounded adversary. This sheds further light on the question of whether an efficient PAC learning algorithm can always be used as an efficient $\log(n)$-robust learning algorithm under the uniform distribution.
翻译:对抗性机器学习的根本问题是量化在躲避攻击的情况下需要多少培训数据。 在本文中, 我们将在PAC学习框架内处理这一问题, 重点是决定列表的类别。 鉴于分布假设在对抗性环境下至关重要, 我们使用概率分布方法, 输入数据满足 Lipschitz 条件的输入数据: 附近点的概率相似。 我们的主要结果显示对手的预算( 即每份输入的比特数) 是确定强有力学习的抽样复杂性的基本数量。 我们的第一个主要结果是一个样本复杂性较低的范围: 单体联结类( 基本上是布尔兰超立方的简单非三重假设类) 和任何超级类的输入数据在对手预算中至少具有指数复杂性。 我们的第二个主要结果显示对应的上限是: 每一份固定的美元( 即每份) $-k美元决定列表中, 相对于一个美元- 受约束的对手的样本复杂度。 这进一步说明了一个有效的 PAC 运算法是否总是使用有效的分配法 。