Non-parametric two-sample tests (TSTs) that judge whether two sets of samples are drawn from the same distribution, have been widely used in the analysis of critical data. People tend to employ TSTs as trusted basic tools and rarely have any doubt about their reliability. This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks and then proposes corresponding defense strategies. First, we theoretically show that an adversary can upper-bound the distributional shift which guarantees the attack's invisibility. Furthermore, we theoretically find that the adversary can also degrade the lower bound of a TST's test power, which enables us to iteratively minimize the test criterion in order to search for adversarial pairs. To enable TST-agnostic attacks, we propose an ensemble attack (EA) framework that jointly minimizes the different types of test criteria. Second, to robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels. Extensive experiments on both simulated and real-world datasets validate the adversarial vulnerabilities of non-parametric TSTs and the effectiveness of our proposed defense.
翻译:判断两组样本是否来自同一分布,在分析关键数据时被广泛使用。人们倾向于将TST作为可信赖的基本工具,很少对其可靠性有任何怀疑。本文系统地揭示了非参数TST的失败模式,通过对抗性攻击,然后提出了相应的防御战略。首先,我们理论上表明,对手可以将保证攻击的不可见性的分布式转移(TST)向上限制。此外,我们理论上认为,对手也可以降低TST测试力的较低范围,从而使我们能够迭接地将测试标准降到最低程度,以寻找对抗性对配方。为了能够使TST的认知性攻击成为可能,我们提出了一个共同尽量减少不同类型测试标准的共通攻击(EA)框架。第二,为了加强TST,我们提出了一种最高限度的优化,以迭代生成对抗性对立配对来训练深层的内核。关于模拟和真实世界数据集的广泛实验可以验证我们提议的非参数防御性防御的对抗性弱点。