Data poisoning attacks pose one of the biggest threats to modern AI systems, necessitating robust defenses. While extensive efforts have been made to develop empirical defenses, attackers continue to evolve, creating sophisticated methods to circumvent these measures. To address this, we must move beyond empirical defenses and establish provable certification methods that guarantee robustness. This paper introduces a novel certification approach, BiCert, using Bilinear Mixed Integer Programming (BMIP) to compute sound deterministic bounds that provide such provable robustness. Using BMIP, we compute the reachable set of parameters that could result from training with potentially manipulated data. A key element to make this computation feasible is to relax the reachable parameter set to a convex set between training iterations. At test time, this parameter set allows us to predict all possible outcomes, guaranteeing robustness. BiCert is more precise than previous methods, which rely solely on interval and polyhedral bounds. Crucially, our approach overcomes the fundamental limitation of prior approaches where parameter bounds could only grow, often uncontrollably. We show that BiCert's tighter bounds eliminate a key source of divergence issues, resulting in more stable training and higher certified accuracy.
翻译:暂无翻译