We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train a robust model. We formulate the data optimization procedure as a bi-level optimization problem on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels). We present extensive experiments on standard computer vision benchmarks using a variety of different models, demonstrating the effectiveness of our method, while also pointing out its current shortcomings. In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought.
翻译:我们引入了对抗性强强分类的元学习算法。 提议的方法试图尽可能作为模型不可知性,并在机器学习系统部署之前优化数据集,目的是有效地消除其非野蛮特征。 一旦数据集建立起来,原则上不需要专门算法( 标准梯度下移) 来训练一个强健模型。 我们将数据优化程序作为内核回归的双级优化问题来制定, 由一组内核来描述无限宽的神经网( Neural Tangent Kernels ) 。 我们展示了使用各种模型的标准计算机视觉基准的广泛实验, 展示了我们方法的有效性, 同时指出其目前的缺点。 与此同时, 我们重新审视了先前也侧重于数据优化问题的工作, 以便进行稳健的分类 \ citep{Ily+19}, 并表明, 在对合适的数据集进行标准( 梯级) 后对对抗性攻击的力度比先前想象的要强得多。