用于学习概率分布的无人监督树助推 (Unsupervised tree boosting for learning probability distributions)

We propose an unsupervised tree boosting algorithm for inferring the underlying sampling distribution of an i.i.d. sample based on fitting additive tree ensembles in a fashion analogous to supervised tree boosting. Integral to the algorithm is a new notion of "addition" on probability distributions that leads to a coherent notion of "residualization", i.e., subtracting a probability distribution from an observation to remove the distributional structure from the sampling distribution of the latter. We show that these notions arise naturally for univariate distributions through cumulative distribution function (CDF) transforms and compositions due to several "group-like" properties of univariate CDFs. While the traditional multivariate CDF does not preserve these properties, a new definition of multivariate CDF can restore these properties, thereby allowing the notions of "addition" and "residualization" to be formulated for multivariate settings as well. This then gives rise to the unsupervised boosting algorithm based on forward-stagewise fitting of an additive tree ensemble, which sequentially reduces the Kullback-Leibler divergence from the truth. The algorithm allows analytic evaluation of the fitted density and outputs a generative model that can be readily sampled from. We enhance the algorithm with scale-dependent shrinkage and a two-stage strategy that separately fits the marginals and the copula. The algorithm then performs competitively to state-of-the-art deep-learning approaches in multivariate density estimation on multiple benchmark datasets.

翻译：我们提出一种不受监督的树增殖算法,用以推断i.d.d.的样本,根据与监督树增殖相似的方式,根据适当的添加型树群组装成与监督型树增殖类似的方式,推断一个i.d.d. 样本的原始采伐分布。该算法集成为一种关于概率分布的“增加”的新概念,导致一种“再生化”的一致概念,即从观察中减去一种概率分布分布,从而从后者的采样分布中去除分配结构。我们表明,这些概念自然产生于通过累积性分配函数(CDF)的未加工分布和成份,其原因是“类似”的复合树群组化特性。虽然传统的多变式CDF不保存这些特性,但多变种分布法的新的定义可以恢复这些特性,从而使得“再现”和“再生”的分布式分布式的概率分布,从而从一个前阶段性基质的递增算法,根据一个添加型的树感测值转换式的变现性树变变变的变的变数和成的变数法,从而从不断降低的变现式变式变本的变式变式变式变式变式变式的变式变式变式的变式的变式变式的变式的变式变式算法,使得的变式的变式的变式的变式的变式的变式的变式法,可以使的变式的变式变式变式变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式。