Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of MMD and HSIC to non-stationary settings by assuming access to independent realisations of the underlying random process. These realisations - in the form of non-stationary time-series measured on the same temporal grid - can then be viewed as i.i.d. samples from a multivariate probability distribution, to which MMD and HSIC can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyper-parameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socio-economic dataset as an example application.
翻译:以内核为基础的 MMD 和 HSIC 进行的两个抽样和独立测试显示,i.d. 数据和静止随机过程的显著结果。然而,这些统计数据并不直接适用于非静止随机过程,这是许多科学学科中普遍存在的一种数据形式。在这项工作中,我们将MMD 和 HSIC的应用扩大到非静止环境,假设能够独立实现基本随机过程。这些实现—— 以在同一时间网格上测量的非静止时间序列的形式—— 能够被视为多变概率分布的样本,即多变概率分布的样本,MMD和 HSIC可以应用到这些样本。我们进一步展示如何通过在内核超分光度方面实现估计测试能力,在这些高空空间上选择合适的内核。在合成数据的实验中,我们展示了我们提议的测试能力在与目前最先进的功能或多变式两种和独立测试相比的优异性方法。最后,我们用实际社会经济数据集方法作为应用的范例。