We propose a way of transforming the problem of conditional density estimation into a single nonparametric regression task via the introduction of auxiliary samples. This allows leveraging regression methods that work well in high dimensions, such as neural networks and decision trees. Our main theoretical result characterizes and establishes the convergence of our estimator to the true conditional density in the data limit. We develop condensité, a method that implements this approach. We demonstrate the benefit of the auxiliary samples on synthetic data and showcase that condensité can achieve good out-of-the-box results. We evaluate our method on a large population survey dataset and on a satellite imaging dataset. In both cases, we find that condensité matches or outperforms the state of the art and yields conditional densities in line with established findings in the literature on each dataset. Our contribution opens up new possibilities for regression-based conditional density estimation and the empirical results indicate strong promise for applied research.
翻译:我们提出了一种通过引入辅助样本,将条件密度估计问题转化为单一非参数回归任务的方法。这使得能够利用在高维数据中表现良好的回归方法,例如神经网络和决策树。我们的主要理论结果刻画并证明了在数据极限下,我们的估计器收敛于真实条件密度。我们开发了 condensité 方法来实现这一途径。我们在合成数据上展示了辅助样本的益处,并证明 condensité 能够取得良好的开箱即用效果。我们在一个大规模人口调查数据集和一个卫星成像数据集上评估了我们的方法。在这两种情况下,我们发现 condensité 达到或超越了现有技术水平,并且产生的条件密度与各数据集文献中的既定发现相符。我们的贡献为基于回归的条件密度估计开辟了新的可能性,实证结果表明其在应用研究中具有广阔前景。