Many adversarial attacks and defenses have recently been proposed for Deep Neural Networks (DNNs). While most of them are in the white-box setting, which is impractical, a new class of query-based hard-label (QBHL) black-box attacks pose a significant threat to real-world applications (e.g., Google Cloud, Tencent API). Till now, there has been no generalizable and practical approach proposed to defend against such attacks. This paper proposes and evaluates PredCoin, a practical and generalizable method for providing robustness against QBHL attacks. PredCoin poisons the gradient estimation step, an essential component of most QBHL attacks. PredCoin successfully identifies gradient estimation queries crafted by an attacker and introduces uncertainty to the output. Extensive experiments show that PredCoin successfully defends against four state-of-the-art QBHL attacks across various settings and tasks while preserving the target model's overall accuracy. PredCoin is also shown to be robust and effective against several defense-aware attacks, which may have full knowledge regarding the internal mechanisms of PredCoin.
翻译:最近为深神经网络(DNNs)提出了许多对抗性攻击和防御建议。虽然大多数攻击是在白箱设置中,这是不切实际的,但基于查询的硬标签(QBHL)黑箱攻击对现实世界应用(例如谷歌云、Tencent API)构成了重大威胁。到目前为止,还没有提出可概括和实用的方法来防范这种攻击。本文件提议并评价PredCoin,这是针对QBHL攻击提供强力的实用和通用的方法。PredCoin给梯度估计步骤下毒,这是大多数QBHL攻击的一个基本组成部分。PredCoin成功地确定了攻击者设计的梯度估计查询,并给产出带来不确定性。广泛的实验表明PredCoin成功地防范了四种最先进的QBHL攻击,同时保持目标模型的总体准确性。PredCoin还证明,对于若干次防御性攻击是稳健和有效的,这些攻击可能完全了解PredCin的内部机制。