Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and amore tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.
翻译:以现代内核为基础的双层抽样测试在通过适当学习的内核区分复杂、高维分布方面取得了巨大成功。先前的工作表明,这一内核学习程序取得了成功,假设了从每个分布中观测到的大量样本。然而,在数据样本数量非常有限的现实情景中,发现一个足以区分复杂分布的内核可能具有挑战性。我们通过引入元双层抽样测试(M2ST)的问题来解决这一问题,该测试旨在利用相关任务(大量)辅助数据找到一种能够迅速确定新目标任务强有力测试的算法。我们为此提出了两种具体的算法:一种超越基线的通用计划,以及一种更有针对性的方法,其效果更好。我们提供了理论理由和经验证据,证明我们提议的元测试计划直接从稀缺的观测中排除基于内核的成型学习测试,并确定这种计划何时成功。