Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.
翻译:以现代内核为基础的双层抽样测试在通过适当学习的内核区分复杂、高维分布方面取得了巨大成功。先前的工作表明,这一内核学习程序成功,假设每个分布中观测到的样本数量相当多。然而,在数据样本数量非常有限的现实情景中,发现一个足以区分复杂分布的内核可能具有挑战性。我们通过引入元双层抽样测试(M2ST)来解决这一问题,该测试旨在利用相关任务(大量)辅助数据找到一种能够快速识别新目标任务有力测试的算法。我们为此提出了两种具体的算法:一种通用计划,在基线基础上加以改进,一种更有针对性的方法则更好。我们提供了理论理由和经验证据,证明我们提议的元测试计划直接从稀缺的观测中排除基于内核的成像的内核测试,并查明这种计划何时成功。