In this work, we study the possibility of defending against "data-poisoning" attacks while learning a neural net. We focus on the supervised learning setup for a class of finite-sized depth-2 nets - which include the standard single filter convolutional nets. For this setup we attempt to learn the true label generating weights in the presence of a malicious oracle doing stochastic bounded and additive adversarial distortions on the true labels being accessed by the algorithm during training. For the non-gradient stochastic algorithm that we instantiate we prove (worst case nearly optimal) trade-offs among the magnitude of the adversarial attack, the accuracy, and the confidence achieved by the proposed algorithm. Additionally, our algorithm uses mini-batching and we keep track of how the mini-batch size affects the convergence.
翻译:在这项工作中,我们研究在学习神经网的同时防范“数据破坏”攻击的可能性。我们侧重于一组有限尺寸的深水-2网的监督性学习设置,其中包括标准的单过滤式集成网。对于这一设置,我们试图学习真正的标签在恶意神器面前产生重量的重量,在培训期间算法访问的真实标签上进行随机的捆绑和添加式对立扭曲。对于我们即时证明(最差的)对抗性攻击规模、准确性以及拟议算法所实现的信任之间的非渐进性随机算法,我们证明了(最差的)对准。此外,我们的算法使用微型比对法,我们跟踪微型比重如何影响趋同。