We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural properties of the data through alternating rounds of generation and discrimination. The method is provably consistent under minimal assumptions. Unlike classic tree-based alternatives, our approach provides smooth (un)conditional densities and allows for fully synthetic data generation. We achieve comparable or superior performance to state-of-the-art probabilistic circuits and deep learning models on various tabular data benchmarks while executing about two orders of magnitude faster on average. An accompanying $\texttt{R}$ package, $\texttt{arf}$, is available on $\texttt{CRAN}$.
翻译:我们建议使用一种新型的、不受监督的随机森林来进行密度估计和数据合成的方法。在基因对抗网络的启发下,我们实施一种循环程序,让树木通过交替的周期生成和区分逐渐学习数据的结构特性。在最低假设下,这种方法可以看似一致。与传统的以树为基础的替代方法不同,我们的方法提供了顺畅(不有条件的密度)并允许完全合成数据生成。我们取得了与最先进的概率电路和各种表格数据基准的深层次学习模型的可比或优异性能,同时平均执行两个数量级的更快。一个配套的 $\ textt{rf}包, $\ textt{arf}$, 以$\ textt{CRAN}提供。