In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach.
翻译:在现代数据分析中,随机变量之间差异的非参数性衡量方法特别重要,在常客文献中对此主题进行了深入的研究,而巴伊西亚环境的发展则有限,因为其应用往往仅限于单体体体。在这里,我们建议采用一种基于模型的巴伊西亚内核双相模测试程序,该测试程序以模拟内核在利用Flaxman等人(al)(2016年)建立的框架复制的Hilbert内核空间中嵌入内核的平均值之间的差异为基础。使用内核方法使得它能够应用于多变式欧洲大陆空间以外的通用域的随机变量。拟议程序的结果是一种事后推论方法,允许自动选择与手头问题相关的内核参数。在一系列合成实验和两次实际数据实验中(即测试高维数据的网络异性以及六人组成的单环兼容性比较),我们举例说明了我们的方法的优点。