We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite matrices that are obtained by evaluating the kernel over pairs of samples. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the ordered spectrum of the Gram matrices and the integral operators associated with the population quantities. The proposed measure of divergence avoids the estimation of the probability distribution underlying the data. Numerical experiments involving comparing distributions and applications to sampling unbalanced data for classification show that the proposed divergence can achieve state of the art results.
翻译:我们采用了基于复制内核Hilbert空间的操作者的数据分布差异的测量方法,这些操作者以无限分散的内核为定义的内核空间进行复制。这种差异的实证估测标准是使用通过对样品的对等内核进行评估而获得的正确定矩阵的精度值来计算的。新测量方法与Jensen-Shannon差异具有相似的特性。提议的估算方法的趋同根据基于Gram矩阵定序频谱和与人口数量相关的整体操作者之间差异的浓度结果得出的。拟议的差异计量方法避免了对数据背后的概率分布的估算。涉及比较分布和应用对不平衡数据取样以进行分类的数值实验表明,拟议的差异可以达到艺术结果的状态。