Detecting deepfakes is an important problem, but recent work has shown that DNN-based deepfake detectors are brittle against adversarial deepfakes, in which an adversary adds imperceptible perturbations to a deepfake to evade detection. In this work, we show that a modification to the detection strategy in which we replace a single classifier with a carefully chosen ensemble, in which input transformations for each model in the ensemble induces pairwise orthogonal gradients, can significantly improve robustness beyond the de facto solution of adversarial training. We present theoretical results to show that such orthogonal gradients can help thwart a first-order adversary by reducing the dimensionality of the input subspace in which adversarial deepfakes lie. We validate the results empirically by instantiating and evaluating a randomized version of such "orthogonal" ensembles for adversarial deepfake detection and find that these randomized ensembles exhibit significantly higher robustness as deepfake detectors compared to state-of-the-art deepfake detectors against adversarial deepfakes, even those created using strong PGD-500 attacks.
翻译:检测深假是一个重要问题,但最近的工作表明,基于 DNN 的深假探测器是对抗对抗性深海假象的捷径,其中对手在深度假象中增加了无法察觉的扰动,以躲避探测。在这项工作中,我们表明对探测战略的修改,即我们用一个精心选择的混合体来取代一个单一分类器,用一种精心选择的混合体来取代一个单一的分类器,在这个组合中,共振诱导双向或直角梯度的每个模型的输入转换,可以大大提高强度,超越对抗性训练的实际解决办法。我们提出理论结果,以表明这种或直角梯度梯度可以通过降低对抗性深海假象的输入子空间的维度来帮助挫败第一阶敌。我们通过即时和评估一种随机的“orogoal”混合物来验证其结果,以便进行对抗性深假的深假相探测,并发现这些随机拼凑的昆虫混合物比状态的深晶探测器更加坚固。