There are two main attack models considered in the adversarial robustness literature: black-box and white-box. We consider these threat models as two ends of a fine-grained spectrum, indexed by the number of queries the adversary can ask. Using this point of view we investigate how many queries the adversary needs to make to design an attack that is comparable to the best possible attack in the white-box model. We give a lower bound on that number of queries in terms of entropy of decision boundaries of the classifier. Using this result we analyze two classical learning algorithms on two synthetic tasks for which we prove meaningful security guarantees. The obtained bounds suggest that some learning algorithms are inherently more robust against query-bounded adversaries than others.
翻译:在对抗性强力文献中,有两种主要的攻击模式:黑箱和白箱。我们认为这些威胁模式是细微的频谱的两端,按对手可以问询的次数进行索引。从这个角度,我们调查对手需要多少次询问才能设计出与白箱模式中可能的最佳攻击相类似的攻击。我们从分类者决定界限的方位角度对这个数目的查询限制较低。我们利用这一结果分析了关于两个合成任务的两个经典学习算法,我们证明这两个任务有切实的安全保障。获得的界限表明,有些学习算法对受质疑的对手来说,必然比其他对手更强大。