We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the ordered spectrum of the Gram matrices and the integral operators associated with the population quantities. The proposed measure of divergence avoids the estimation of the probability distribution underlying the data. Numerical experiments involving comparing distributions and applications to sampling unbalanced data for classification show that the proposed divergence can achieve state of the art results.
翻译:我们采用了基于复制内核Hilbert空间操作员的数据分布差异测量方法。根据内核定义的内核空间复制操作员的数据分布方法,对差异的实证估计方法是使用通过对数据点的内核进行评估而获得的正数确定格拉姆矩阵的元值来计算。新测量方法与Jensen-Shannon差异具有相似的特性。拟议估算器的趋同根据根据Gram矩阵定序频谱和与人口数量相关的整体操作员之间的差异得出的集中结果得出。拟议的差异测量方法避免了对数据潜在概率分布的估算。关于比较分布和应用对不平衡数据取样以进行分类的数值实验表明,拟议的差异可以实现最新结果。