Many interesting machine learning problems are best posed by considering instances that are distributions, or sample sets drawn from distributions. Previous work devoted to machine learning tasks with distributional inputs has done so through pairwise kernel evaluations between pdfs (or sample sets). While such an approach is fine for smaller datasets, the computation of an $N \times N$ Gram matrix is prohibitive in large datasets. Recent scalable estimators that work over pdfs have done so only with kernels that use Euclidean metrics, like the $L_2$ distance. However, there are a myriad of other useful metrics available, such as total variation, Hellinger distance, and the Jensen-Shannon divergence. This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators using such kernels to scale to large datasets by working in a primal space, without computing large Gram matrices. We provide an analysis of the approximation error in using our proposed random features and show empirically the quality of our approximation both in estimating a Gram matrix and in solving learning tasks in real-world and synthetic data.
翻译:许多有趣的机器学习问题最好通过考虑分布或分布中抽取的样本集的事例来提出。以前专门从事分配投入的机器学习工作的工作是通过对开式(或抽样组)之间的对称内核评估完成的。虽然这种方法对较小的数据集来说是好的,但在大型数据集中计算一个美元=乘以美元=乘以美元=Gram 矩阵是令人望而却步的。最近的可缩放估量器只在使用使用Euclidean 度量器的内核(如:$L_2美元=距离)的情况下才这样做。然而,还有其他许多有用的指标,如全变、Hellinger 距离和Jensen-hannon差异。这项工作为那些其点产品接近非欧元度量度的内核的pdf开发了第一个随机特性。在使用这些非欧元度量度仪表时,使用这种内核的测器的测算器通过在原始空间工作,不计算大格模矩阵等,来测量大型数据集的大小。我们用真实的随机特性和模拟模型分析在模拟中进行真实的模拟和模拟分析时的模拟数据质量。