A Hilbert space embedding of a distribution---in short, a kernel mean embedding---has recently emerged as a powerful tool for machine learning and inference. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability measures. It can be viewed as a generalization of the original "feature map" common to support vector machines (SVMs) and other kernel methods. While initially closely associated with the latter, it has meanwhile found application in fields ranging from kernel machines and probabilistic modeling to statistical inference, causal discovery, and deep learning. The goal of this survey is to give a comprehensive review of existing work and recent advances in this research area, and to discuss the most challenging issues and open problems that could lead to new research directions. The survey begins with a brief introduction to the RKHS and positive definite kernels which forms the backbone of this survey, followed by a thorough discussion of the Hilbert space embedding of marginal distributions, theoretical guarantees, and a review of its applications. The embedding of distributions enables us to apply RKHS methods to probability measures which prompts a wide range of applications such as kernel two-sample testing, independent testing, and learning on distributional data. Next, we discuss the Hilbert space embedding for conditional distributions, give theoretical insights, and review some applications. The conditional mean embedding enables us to perform sum, product, and Bayes' rules---which are ubiquitous in graphical model, probabilistic inference, and reinforcement learning---in a non-parametric way. We then discuss relationships between this framework and other related areas. Lastly, we give some suggestions on future research directions.
翻译:希尔伯特空间 嵌入一个分布直径短, 内核暗嵌入意味着嵌入, 最近出现了一个强大的机器学习和推断工具。 这个框架的基本理念是将分布映射成一个复制内核希尔伯特空间( RKHS ), 整个内核方法库可以扩展至概率测量。 这可以被看作是原始“ 性能映射” 的概括化, 用来支持矢量机( SVMS) 和其他内核方法。 虽然最初与后者密切相关, 但与此同时, 它在一系列领域找到了应用应用, 从内核机器和概率直径应用模型到统计推断、因果发现和深度学习。 本次调查的目的是全面审查目前的工作和最近的进展, 讨论最具有挑战性的问题和可能带来新的研究方向的开放问题。 调查首先, 我们的直观和正本直线直线直线直径直向一个方向, 然后是彻底讨论希尔伯特空间空间分布中的一些边端分配和正正向性应用模式, 将一些直径直向性应用模型和直径直径直径直径直径直的模型, 以及直径直径直径直的系统测试方法, 使我们可以进行下两个空间分布空间分布、 学习。