In this work, we study the possibility of defending against data-poisoning attacks while training a shallow neural network in a regression setup. We focus on doing supervised learning for a class of depth-2 finite-width neural networks, which includes single-filter convolutional networks. In this class of networks, we attempt to learn the network weights in the presence of a malicious oracle doing stochastic, bounded and additive adversarial distortions on the true output during training. For the non-gradient stochastic algorithm that we construct, we prove worst-case near-optimal trade-offs among the magnitude of the adversarial attack, the weight approximation accuracy, and the confidence achieved by the proposed algorithm. As our algorithm uses mini-batching, we analyze how the mini-batch size affects convergence. We also show how to utilize the scaling of the outer layer weights to counter output-poisoning attacks depending on the probability of attack. Lastly, we give experimental evidence demonstrating how our algorithm outperforms stochastic gradient descent under different input data distributions, including instances of heavy-tailed distributions.
翻译:在这项工作中,我们研究在进行回归式的浅神经网络培训时防范数据污染攻击的可能性。我们侧重于为一组深度-2-有限线神经网络(包括单过滤器革命网络)进行有监督的学习。在这类网络中,我们试图在恶意神器面前学习网络重量,对培训期间的真正产出进行随机、捆绑和添加性对抗扭曲。对于我们构建的非渐进式随机算法,我们证明在对抗性攻击的规模、重量近似精确度和拟议算法获得的信任之间,情况最差,因为我们的算法使用微型截断法,我们分析微型批量大小如何影响趋同。我们还表明如何利用外层重量的缩放,以对抗攻击的可能性为根据。最后,我们提供实验性证据,说明我们的算法如何在不同的输入数据分布下,包括重尾线分布中,形成相近最佳的梯度下降。