Deep learning systems are known to be vulnerable to adversarial examples. In particular, query-based black-box attacks do not require knowledge of the deep learning model, but can compute adversarial examples over the network by submitting queries and inspecting returns. Recent work largely improves the efficiency of those attacks, demonstrating their practicality on today's ML-as-a-service platforms. We propose Blacklight, a new defense against query-based black-box adversarial attacks. The fundamental insight driving our design is that, to compute adversarial examples, these attacks perform iterative optimization over the network, producing image queries highly similar in the input space. Blacklight detects query-based black-box attacks by detecting highly similar queries, using an efficient similarity engine operating on probabilistic content fingerprints. We evaluate Blacklight against eight state-of-the-art attacks, across a variety of models and image classification tasks. Blacklight identifies them all, often after only a handful of queries. By rejecting all detected queries, Blacklight prevents any attack to complete, even when attackers persist to submit queries after account ban or query rejection. Blacklight is also robust against several powerful countermeasures, including an optimal black-box attack that approximates white-box attacks in efficiency. Finally, we illustrate how Blacklight generalizes to other domains like text classification.
翻译:深层次的学习系统已知很容易受到对抗性攻击的例子。 特别是, 以查询为基础的黑盒攻击并不需要深层次学习模式的知识, 而是可以通过提交查询和检查反馈来计算网络上的对抗性例子。 最近的工作在很大程度上提高了这些攻击的效率, 展示了这些攻击在今天的 ML- A- 服务平台上的实用性。 我们提议了黑灯, 一种防止基于查询的黑盒对抗性攻击的新防御系统。 我们设计的基本洞察力是, 要计算敌对性例子, 这些攻击在网络上进行迭代优化, 产生与输入空间非常相似的图像查询。 黑灯通过探测高度相似的查询来探测基于查询的黑盒攻击, 使用高效的类似引擎在概率性内容指纹上操作。 我们评估了黑灯对八种最先进的攻击的实用性攻击, 跨越了不同的模型和图像分类任务。 黑灯识别了所有这些攻击, 通常只经过少量查询。 通过拒绝所有检测, 黑灯防止任何攻击完成, 即使攻击者坚持在账户禁止或拒绝后提出查询。 黑灯也能够对几个强大的反措施进行有力的检查, 。 黑光灯也很强, 。